Content area

Abstract

To enhance the recognition and preservation of syntactic complexity in Chinese–English translation, this study proposes an optimized Bidirectional Long Short-Term Memory–Conditional Random Field (BiLSTM-CRF) model. Based on the Workshop on Machine Translation (WMT) Chinese-English parallel corpus, an experimental framework is designed for two types of specialized data: complex sentences and cross-linguistic sentence pairs. The model integrates explicit syntactic features, including part-of-speech tags, dependency relations, and syntactic tree depth, and incorporates an attention mechanism to improve the model’s ability to capture syntactic complexity. In addition, this study constructs an evaluation framework consisting of eight indicators to assess syntactic complexity recognition and translation quality. These indicators encompass: (1) Average syntactic node depth (higher values indicate greater complexity; typically ranging from 1.0 to 5.0); (2) The number of embedded clause levels (higher values illustrate greater complexity; typically 0–5); (3) Long-distance dependency ratio (higher values indicate broader dependency spans; range 0–1, moderate values preferred); (4) Average branching factor (higher values show denser modifiers; range 1.0–4.0); (5) Syntactic change ratio (lower values demonstrate structural stability; range 0–1); (6) Translation alignment consistency rate (higher values indicate better alignment; range 0–1); (7) Syntactic tree reconstruction cost (lower values refer to smaller structural adjustment overhead; range 0–1); (8) Translation syntactic balance (higher values illustrate more natural syntactic rendering; range 0–1). This indicator system enables comprehensive evaluation of the model’s capabilities in syntactic modeling, structural preservation, and cross-linguistic alignment. Experimental results show that the optimized model outperforms baseline models across multiple core indicators. On the complex sentence dataset, the optimized model achieves a long-distance dependency ratio of 0.658 (moderately high), an embedded clause level of 3.167 (indicating complex structure), and an average branching factor of 2.897. The syntactic change ratio is only 0.432, all of which significantly outperform comparative models such as Syntax-Transformer and Syntax-Bidirectional Encoder Representations from Transformers (Syntax-BERT). On the cross-linguistic sentence dataset, the optimized model attains a syntactic tree reconstruction cost of only 0.214 (low adjustment overhead) and a translation alignment consistency rate of 0.894 (high alignment accuracy). This demonstrates remarkable advantages in structural preservation and adjustment. In contrast, comparison models show unstable performance on complex and cross-linguistic data. For example, Syntax-BERT achieves only 2.321 for the embedded clause level, indicating difficulty in handling complex syntactic structures. In summary, by introducing explicit syntactic features and a multidimensional indicator system, this study demonstrates strong modeling capacity in syntactic complexity recognition and achieves better preservation of syntactic structures during translation. This study offers new insights into syntactic complexity modeling in natural language processing and provides valuable theoretical and practical contributions to syntactic processing in machine translation systems.

Details

1009240
Title
Syntactic complexity recognition and analysis in Chinese-English machine translation: A comparative study based on the BLSTM-CRF model
Author
Publication title
PLoS One; San Francisco
Volume
20
Issue
6
First page
e0325721
Publication year
2025
Publication date
Jun 2025
Section
Research Article
Publisher
Public Library of Science
Place of publication
San Francisco
Country of publication
United States
e-ISSN
19326203
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Milestone dates
2025-01-07 (Received); 2025-05-16 (Accepted); 2025-06-12 (Published)
ProQuest document ID
3218425698
Document URL
https://www.proquest.com/scholarly-journals/syntactic-complexity-recognition-analysis-chinese/docview/3218425698/se-2?accountid=208611
Copyright
© 2025 Yongli Tian. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-06-13
Database
ProQuest One Academic