Content area
Computational tools have become increasingly prevalent in the analysis and evaluation of various linguistic dimensions in second language (L2) writing pedagogy and research. Despite their widespread use, there is limited research investigating the alignment between computationally derived linguistic features and human assessments of academic writing quality. To fill this gap, this study probed the extent to which computational indices of syntactic and lexical features predict human-judged assessments of narrative writing quality. A total of 104 essays written by Iranian undergraduate learners of English as a Foreign Language (EFL) were analyzed using three computational tools: Coh-Metrix, VocabProfiler, and the Tool for the Automatic Analysis of Cohesion (TAACO). The results from correlation and regression analyses revealed that the computational indices of lexical features were significant predictors of human-judged writing quality, with lexical diversity and sophistication emerging as the most significant predictors. Manual coding of syntactic complexity proved to be a stronger predictor of writing quality than computational measures of this text feature. These findings underscore the value of computational tools in L2 writing assessment, while simultaneously highlighting their limitations in capturing the multifaceted nature of writing quality. Furthermore, the results point to an overemphasis on infrequent and diverse vocabulary in current analytic writing rubrics, suggesting that these rubrics should be revised to adopt a more comprehensive perspective on lexical proficiency in L2 writing pedagogy and evaluation.
Introduction
Academic writing plays a crucial role in the academic success of English as a Foreign Language (EFL) students, as it fosters critical thinking, argumentation, and the ability to engage with scholarly discourse (Hyland, 2009; Manchón, 2011). Research in second language acquisition (SLA) emphasizes the pivotal role of both lexical and syntactic development in academic writing, with lexical richness and grammatical complexity often serving as key indicators of writing quality (e.g., Crossley et al., 2012; Hao et al., 2023; Larsen-Freeman, 2006; Lu, 2010; Maamuujav et al., 2021; Yang et al., 2023). Previous research highlights that the development of academic writing skills demands both extensive exposure to the target language and explicit instruction for the improvement of syntactic and lexical features within second language (L2) writing (Crossley & McNamara, 2012; Ortega, 2015). Therefore, understanding the linguistic features that contribute to effective academic writing is crucial for developing pedagogical strategies that support EFL learners in mastering this skill (Crossley & McNamara, 2012; Maamuujav et al., 2021; Yasuda, 2024).
Computational tools offer valuable insights by analyzing syntactic and lexical features that influence the overall quality of writing. Studies leveraging tools, such as Coh-Metrix (McNamara et al., 2014), VocabProfiler (Cobb, 2018), and Tool for the Automatic Analysis of Cohesion (TAACO) (Crossley et al., 2015, 2019), have demonstrated the predictive power of lexical sophistication and cohesion indices in human-rated evaluations of L2 writing (e.g., Abdi Tabari & Johnson, 2023; Crossley & McNamara, 2012; Crossley et al., 2014; Kyle & Crossley, 2015; Maamuujav et al., 2021). Despite the growing reliance on computational tools in the analysis of L2 academic writing, there remains a lack of clarity regarding how well these tools align with human judgments of writing quality. While instruments such as Coh-Metrix and TAACO provide detailed linguistic profiles—including measures of syntactic complexity, lexical diversity, and cohesion—there is still limited understanding of which specific computationally derived indices meaningfully correspond to expert human evaluations. This raises important questions about the validity and pedagogical relevance of such tools in educational settings.
Although previous studies (e.g., Kyle & Crossley, 2015) have shown correlations between certain linguistic features and writing quality, the predictive power of these features varies considerably across contexts and genres. This inconsistency points to a pressing need for more nuanced investigations that compare the outputs of computational analyses with actual human-rated assessments, especially within EFL contexts. Notably, lexical sophistication has received substantial attention, while the role of syntactic complexity—particularly in terms of sentence structure, clause types, and phrase embedding—has been comparatively under-explored (Xu & Casal, 2023).
Furthermore, research has often treated computational evaluations and human judgments as separate entities, without exploring their interrelationship in depth. This study seeks to bridge that gap by systematically comparing syntactic and lexical indices from computational tools with human ratings of writing quality. By doing so, it aims to assess not only the accuracy but also the pedagogical usefulness of these tools for evaluating and improving L2 academic writing.
Literature review
Conceptual framework
This study is situated within the field of academic writing and focuses specifically on the linguistic features of narrative essays written by L2 learners. The goal is to investigate the extent to which computationally derived linguistic features—particularly syntactic and lexical characteristics—align with human judgments of writing quality in narrative writing. Narrative writing, as a genre, requires learners to deploy linguistic features that serve both textual organization and expressive content, including the coherent sequencing of events, appropriate use of tense and aspect, and the deployment of descriptive and evaluative language (Labov & Waletzky, 1967). Understanding how these syntactic and lexical features appear in students’ texts offers valuable insights into their developmental trajectories in L2 writing.
Structurally, narratives typically follow a canonical framework comprising orientation, complicating action, evaluation, resolution, and coda (Labov & Waletzky, 1967). The orientation sets the scene by providing context about time, place, and characters; the complicating action introduces events that disrupt the initial situation; the evaluation offers the narrator’s perspective on the events; the resolution details how the conflict is resolved; and the coda brings the narrative back to the present, providing closure. This structure aids in organizing the narrative logically and coherently, facilitating the reader’s understanding.
Syntactic complexity in narrative writing is often achieved through the use of subordinate clauses, which allow writers to express temporal, causal, and logical relationships between events. The development of such complexity is indicative of a writer’s linguistic proficiency and cognitive development. Studies have shown that as children mature, their narratives exhibit increased syntactic complexity, marked by a higher frequency of subordinate clauses and longer mean lengths of T-units (Drijbooms et al., 2017). This progression reflects the growing ability to construct more intricate sentence structures that enhance the depth and clarity of the narrative.
Lexically, narrative essays often exhibit specific features such as the use of past tense verbs and temporal adverbials to indicate the chronological order of events. They frequently employ first-person pronouns to convey personal involvement and authenticity, as well as descriptive language to create vivid imagery and convey emotions. Cohesive devices, including conjunctions and transitional phrases, are employed to ensure coherence and flow between events (Crossley & McNamara, 2010).
Lexical sophistication in narratives involves the use of a diverse and precise vocabulary to vividly depict characters, settings, and events. Effective narrative writing often includes abstract nouns and metacognitive verbs, which enable writers to convey complex ideas and internal states. For example, the use of words like realize, believe, or understand allows for the expression of characters’ thoughts and motivations, adding depth to the narrative (Sun & Nippold, 2012). Additionally, a rich and varied vocabulary contributes to the overall quality of the narrative by enhancing its descriptive power and emotional resonance.
In this study, we examined the extent to which computationally derived linguistic features of EFL narrative essays contribute to human-rated writing quality, as well as the comparative contributions of human coding and computational evaluation of syntactic complexity. The human evaluation of syntactic features of narrative writing was informed by Halliday’s (1994) systemic functional linguistic (SFL) which conceives language as a resource for making meaning in social contexts, organized around three metafunctions—ideational, interpersonal, and textual. Central to SFL’s textual metafunction is its analysis of clause complexes, where meanings are built not only within individual clauses but through relationships between clauses. Hypotactic relations (subordination) embed one clause within another, creating a hierarchical structure: for example, an extension relation elaborates on an idea (e.g., she arrived early, because she wanted coffee), an enhancement relation adds circumstantial detail (e.g., he ran when the bell rang), or a projection relation reports speech or thought. By contrast, paratactic relations (coordination) link clauses of equal status without embedding—using coordinating conjunctions like and, but, or so—to chain ideas or actions (e.g., she wrote the report, and he presented it). In our analysis of narrative essays, we applied this clause–complex framework to manually code each subordinate clause and assess how learners deploy both subordination and coordination to achieve narrative cohesion and complexity.
Syntactic features of academic writing
Academic writing, especially within the domain of SLA, is marked by distinctive syntactic characteristics that differentiate it from conversational or informal writing. One of the defining features is syntactic complexity, often realized through the use of subordinate clauses, embedded structures, passive voice, and expanded noun phrases. These grammatical constructions are essential for expressing complex relationships, elaborating on abstract concepts, and presenting arguments in a structured and cohesive manner.
According to Biber et al., (2011), academic discourse is typified by high frequencies of embedded and subordinate clauses, which allow for the integration of multiple propositions within a single sentence. Biber and Gray (2016) further emphasize the prevalence of nominalization and complex noun phrase constructions in academic texts, which contribute to information density and syntactic compactness. The use of passive voice is also widespread, especially in scientific and technical writing, where the focus is often placed on processes or outcomes rather than the agent (Biber & Conrad, 2019). In addition, academic writing typically features coordination and subordination through conjunctions (e.g., although, because, whereas) to establish logical relationships, thereby enhancing cohesion and argumentative clarity (Biber et al., 2011).
To systematically analyze these features, researchers often employ automated text analysis tools such as Coh-Metrix, which evaluates syntactic complexity using indices like sentence length, left-embeddedness, number of modifiers per noun phrase, clausal density, and passive constructions (McNamara et al., 2014). These metrics help reveal the depth of syntactic sophistication in L2 writing, offering quantitative insights that complement qualitative evaluations (Crossley & McNamara, 2014; Kyle & Crossley, 2018).
While a substantial body of empirical research suggests that syntactic complexity is a relevant predictor of writing quality, the findings are far from conclusive, and several conceptual and methodological concerns remain. For instance, Lu (2011) reported that longer and more structurally varied sentences are linked to higher-quality L2 writing, yet this study, like many others, relied heavily on surface-level indices (e.g., mean length of T-units), which may conflate sentence length with complexity without accounting for syntactic variety or depth. Similarly, Norris and Ortega (2009) concluded that increased syntactic complexity correlates with more advanced L2 writing performance.
Casal and Lee (2019) offered a more nuanced view by showing that clausal complexity had limited predictive power for writing quality in academic research papers, whereas phrasal elaboration—particularly nominal density—proved to be a stronger indicator. This highlights the need to reconsider which syntactic features are most informative for evaluating writing proficiency, especially in genre-specific contexts. Likewise, while Maamuujav et al. (2021) found that syntactic complexity measures contributed to human-rated scores, their study did not sufficiently disentangle the unique contributions of individual indices, raising questions about redundancy and multicollinearity. Jiang et al. (2019), in a large-scale study on L2 narrative writing, demonstrated that indices such as mean sentence length and dependent clauses per clause could differentiate between proficiency levels. However, these findings must be interpreted with caution, as the study did not account for the rhetorical and functional use of syntactic structures, which can be equally important indicators of writing quality. Overall, the field continues to grapple with determining which measures of syntactic complexity are most valid, reliable, and pedagogically meaningful—suggesting that the relationship between syntactic complexity and writing quality remains an open and complex question.
Lexical features of academic writing
Lexical features play a pivotal role in establishing the formal, precise, and objective tone of academic writing. Among these features, lexical sophistication—defined as the use of low-frequency, academic, and discipline-specific words—serves as a critical marker of lexical maturity. Advanced L2 writers are more likely to employ such vocabulary to express abstract concepts and engage in scholarly discourse (Laufer & Nation, 1995; Tracy-Ventura, 2017). For example, words such as conundrum, paradigm, or ubiquitous signal a higher level of lexical sophistication compared to general, high-frequency vocabulary.
Another salient feature is lexical diversity, which refers to the breadth of unique vocabulary used in a text. Measures such as the type-to-token ratio (TTR) and the measure of textual lexical diversity (MTLD) are commonly used to assess the range of vocabulary (Crossley et al., 2014). Greater lexical diversity not only indicates vocabulary breadth but also helps maintain reader engagement and avoid repetition.
Lexical density, or the proportion of content words (nouns, verbs, adjectives, adverbs) relative to function words, is another key characteristic of academic writing. It reflects the information load of a text and is closely linked to writing proficiency. Research suggests that more proficient L2 writers produce lexically denser texts, demonstrating better vocabulary control and content development (Hou et al., 2016; Kim, 2014).
Furthermore, lexical cohesion—the use of cohesive devices such as reiteration, synonymy, and collocation—enhances textual coherence by linking ideas across sentences and paragraphs. This aspect of cohesion is especially important in academic writing, where the logical flow of arguments is paramount. Tools like Coh-Metrix assess semantic overlap using latent semantic analysis (LSA), while TAACO captures synonym and content word overlap across text segments (Crossley et al., 2015, 2019; Landauer et al., 2007).
Empirical research supports the central role of lexical features in writing assessment. Laufer and Nation (1995) demonstrated that L2 students who used a higher proportion of infrequent vocabulary achieved better writing scores. Kyle and Crossley (2015) found that lexical sophistication was a strong predictor of L2 writing quality. Crossley et al. (2014) and Woods et al. (2023) confirmed that lexical diversity correlates with writing scores, but their findings are often based on surface-level diversity measures, which may not fully capture the richness of lexical proficiency. Uccelli et al. (2012) found a weak correlation between lexical density and writing quality, suggesting that overemphasis on density could be misleading, while Kim (2014) argued that lexical density significantly predicted EFL learners’ academic performance, possibly confounding other linguistic factors.
Lexical cohesion has also been extensively studied as a predictor of writing quality. Crossley and McNamara (2012) found that texts with stronger cohesive links were rated higher in coherence and organization. Kim and Crossley (2018), using both Coh-Metrix and TAACO, reported that high-scoring essays on TOEFL tasks exhibited greater lexical cohesion. More recently, Abdi Tabari and Johnson (2023) showed that local cohesion better predicted narrative essay quality, while global cohesion was more predictive in argumentative writing.
Research questions
While research has examined syntactic and lexical features in writing, significant gaps persist regarding their relative contributions across various academic genres and learning contexts. First, while computational tools like Coh-Metrix and TAACO provide valuable linguistic insights, their correspondence with human-rated writing assessments remains uncertain. Second, syntactic complexity in EFL contexts is under-researched, with findings suggesting it has a lesser impact on writing quality compared to lexical features (Xu & Casal, 2023). Finally, comparative analyses of human-rated and automated evaluations of syntactic complexity could offer a more comprehensive understanding of computational tools’ effectiveness in assessing academic writing (Kyle & Crossley, 2018; Maamuujav et al., 2021). To fill these gaps, this study aims to answer the following research questions:
To what extent do the computationally derived linguistic features of EFL narrative essays contribute to human-rated writing quality?
To what extent do human coding and computational evaluation of syntactic complexity of EFL narrative essays contribute to human-rated writing quality?
Methodology
Design of the study
This study employs a quantitative, correlational design to investigate the relationships between computationally derived linguistic features (specifically syntactic and lexical features) and human-rated writing quality in EFL learners’ essays. The design allows for the analysis of how certain linguistic variables correlate with writing performance as assessed by human raters. The correlational design is especially suited to the study’s aims, as it allows the exploration of how linguistic features relate to the human evaluation of writing quality without manipulating variables.
Participants
A convenient sample of 104 s-year English Language Teaching (ELT) students (71 females and 33 males) from two universities in Iran participated in this study. Participants’ ages ranged from 18 to 29 (M = 20.30, SD = 1.89), and they were native speakers of Turkish (65%), Persian (31%), or Kurdish (4%). The students were enrolled in four classes of one academic writing course, focusing on argumentative, expository, and narrative essays. Each class was held weekly over a 14-week semester and taught by the same professor with a Ph.D. in Applied Linguistics, using the same textbook and assignments.
Narrative writing task
The participants were asked to write a narrative essay, a genre commonly used in academic writing courses that requires detailed descriptions of events and settings (Biber & Conrad, 2019; Yoon & Polio, 2017). The rationale for selecting this task is that narrative writing demands the use of complex sentence structures, particularly those that express temporal, causal, and logical relationships, which are key markers of syntactic complexity. Furthermore, it provides an opportunity to assess the writer’s ability to organize ideas and maintain cohesion, which is also reflective of lexical proficiency. The prompt for the narrative task (see Appendix 1) encouraged participants to share their personal stories, providing a context in which they could naturally engage these elements of writing. This prompt was derived from the Longman Academic Writing Series 3: Paragraphs to Essays (4th Edition) by Oshima and Hogue (2013), a textbook widely suggested in the English Language Teaching (ELT) curriculum for undergraduate students in Iran.
Procedures
The students were initially briefed on the objectives of the study. It was emphasized that participation was voluntary, and participants had the right to withdraw at any time without facing any consequences. Assurance was also provided regarding the confidentiality and anonymity of their data, with a clear explanation that their performance on the tasks would have no effects on their course grades. Notably, all students chose to participate, and each signed a consent form. Since instruction on narrative writing had been completed by the tenth week of the course, the narrative writing task was administered in the subsequent week. This timing ensured that participants were adequately familiar with the genre’s structural and linguistic conventions. The task was conducted within a 60-min session, with students required to write a minimum of 300 words. All students completed the writing task by hand using pen and paper in a supervised classroom setting. During the session, the use of external resources, including smartphones and course materials, was strictly prohibited.
Measuring syntactic and lexical features
The corpus of the study was initially prepared for linguistic analysis. This process involved transcribing the handwritten essays and coding the sentences. For Coh-Metrix analysis, mistakes in spelling were corrected in the essays to ensure an accurate computation of linguistic indices. In accordance with the recommendations of McNamara et al. (2014) for using Coh-Metrix, spelling errors and typographical mistakes were corrected to enhance the precision of lexical feature analyses. Each essay underwent three rounds of meticulous review by an Assistant Professor of Applied Linguistics to ensure that the text met Coh-Metrix’s criteria and was free from such errors. This cleaning process was crucial, as unclean texts containing spelling mistakes or unusual symbols could compromise the validity of Coh-Metrix outputs (Dowell et al., 2016). A sample of the narrative essays inserted in Coh-Metrix is provided in Fig. 1.
[See PDF for image]
Fig. 1
A narrative essay inserted in Coh-Metrix
To examine the syntactic complexity of the narrative compositions, both manual coding and the computational tool Coh-Metrix were employed. Two experienced raters, who had Ph.D. degrees in Applied Linguistics, analyzed all sentences in the essays. The raters classified sentences according to three main aspects informed by Halliday’s (1994) SFL approach: (1) the structural complexity of sentences, identifying them as simple, compound, or complex; (2) the various clause types, including adjective, adverbial, nominal, and reduced clauses, as well as different phrase types, such as participial, absolute, and appositive phrases; and (3) issues related to sentence boundaries, such as sentence fragments, run-ons, or grammatically flawed sentences that failed to adhere to standard English grammatical norms. Fragmented, run-on, and flawed structures were categorized as unconventional due to their deviation from standard academic writing norms. Finite complement clauses, such as I believe and he knows, were classified under simple structures (see Table 1). The syntactic indices were manually coded and calculated using normalized counts to ensure comparability across essays of different lengths. Specifically, raw counts of syntactic features were divided by the total number of words in each essay and then multiplied by 100 to obtain frequencies per 100 words. This normalization process mitigates the influence of essay length on syntactic feature counts, providing a more standardized and meaningful basis for comparing syntactic complexity across different texts. To ensure consistency in sentence classification, an inter-coder reliability analysis was performed, which demonstrated an acceptable agreement rate of 86.4%.
Table 1. Manual coding of sentences
Sentence category | Subcategory/classification | Description | Example |
|---|---|---|---|
Simple structure | Simple sentence | A sentence containing only one independent clause. | The child played in the garden. |
Finite complement clause | A structure where a verb is followed by a that clause. | I believe that this story is about perseverance. | |
Compound sentence | Coordinated clauses | Two or more independent clauses linked by a coordinating conjunction. | She studied hard for the test, and she passed it easily. |
Complex sentence | Finite adverbial clause | A dependent clause that explains or qualifies the main clause. | Although the weather was bad, they decided to go hiking. |
Finite noun modifier clause | A clause, such as a relative clause, used to describe or modify a noun. | I admire the teacher who inspires her students every day. | |
Unconventional sentence | Fragment | An incomplete sentence that lacks a complete thought, subject, or predicate. | Running to catch the bus. |
Run-on | Two or more independent clauses improperly joined without punctuation or conjunctions. | The cat was hungry it ate all the food quickly. | |
Faulty sentence | A sentence with structural or semantic errors that make it unclear or incoherent. | While walking through the park enjoying the flowers and forgetting about the time entirely. |
We also utilized computational tools for the analyses of linguistic features of narrative essays (see Table 2). Informed by prior research on L2 writing (e.g., Biber et al., 2011; Casal & Lee, 2019; Kim & Crossley, 2018; Kyle & Crossley, 2017, 2018; Lu, 2011; Maamuujav et al., 2021), computational indices of sentence length, left embeddedness, number of modifiers per noun phrase, and syntax similarity were selected from Coh-Metrix version 3.0 (McNamara et al., 2014) for syntactic analysis. Sentence length and syntactic variation are key indicators of writing quality since, according to Lu (2011), proficient L2 writers typically produce longer and more varied structures. We selected sentence length, left‐embeddedness, number of modifiers per noun phrase, and syntax similarity because each index taps a distinct dimension of syntactic complexity—namely, overall length/size, depth of subordination, phrasal elaboration, and clause‐to‐clause structural overlap. These dimensions align with Halliday’s (1994) notion of clause complexes and have been empirically validated as strong predictors of L2 writing quality in prior research (Crossley & McNamara, 2014; Kyle & Crossley, 2018; Lu, 2011).
Table 2. Syntactic and lexical features, indices, and computational tools
Linguistic features | Index/measure | Tool | |
|---|---|---|---|
Syntactic features | Syntactic complexity | Sentence length (number of words) | Coh-Metrix |
Left embeddedness | Coh-Metrix | ||
Number of modifiers per noun phrase | Coh-Metrix | ||
Sentence syntax similarity (across paragraphs) | Coh-Metrix | ||
Lexical features | Lexical diversity | Measure of textual and lexical diversity for all words | Coh-Metrix |
Lexical density | Percentage of content words | VocabProfiler | |
Lexical sophistication | Word frequency | Coh-Metrix | |
Age of acquisition | Coh-Metrix | ||
Words from the K1 frequency | VocabProfiler | ||
Words from the K2 frequency | VocabProfiler | ||
Words from the academic word list (AWL) | VocabProfiler | ||
Lexical cohesion | Synonym overlap among sentences and paragraphs | TAACO | |
For lexical analysis, a vocabulary profile for each essay was generated using the computational tools Coh-Metrix version 3.0 (McNamara et al., 2014), TAACO version 2.0.4 (Crossley et al., 2016), and VocabProfiler version v.4 (Cobb, 2018). While both TAACO and Coh-Metrix measure various aspects of cohesion, TAACO offers a broader range of global cohesion indices, including measures of synonym overlap. VocabProfiler (available at https://www.lextutor.ca/vp/eng/) is an online tool that analyzes the vocabulary profile of a text by categorizing words based on frequency bands, such as the first 1000 most frequent words in English (K1), the second 1000 (K2), and words from the Academic Word List (AWL).
Lexical diversity was analyzed using MTLD from Coh-Metrix, selected for its stability across texts of varying lengths, unlike the traditional TTR, which is highly sensitive to text length (Jarvis, 2013). McCarthy and Jarvis (2010) validated MTLD as a robust and reliable measure. Lexical density, reflecting the ratio of content words to total words, was calculated using VocabProfiler. For lexical sophistication, various indices were employed: Coh-Metrix provided data on word frequency (WF), and age of acquisition (AOA), while VocabProfiler assessed the percentage of words from the 1000-word (K1) and 2000-word (K2) frequency bands and the Academic Word List (AWL).
To evaluate lexical cohesion, indices from TAACO were utilized. Following Crossley et al.’s (2016) framework for lexical cohesion, synonym overlap for nouns and verbs across sentences and paragraphs was calculated using TAACO. A summary of the indices used for syntactic and lexical analyses is detailed in Table 2.
Data analysis
The resulting corpus consisted of 104 narrative essays, comprising a total of 34,338 tokens (total word count). The average length of the writing samples collected for the purpose of this study was 330.17 (SD = 13.87). Analytic scoring scales were used to rate the narrative essays, as they provide more detailed and reliable assessments than holistic scoring by evaluating multiple writing dimensions (East, 2009; Weigle, 2002). This study utilized an adapted version of the scale developed by Jacobs et al. (1981) and revised by Connor-Linton and Polio (2014), a widely recognized and reliable framework for writing assessment (Weigle, 2002). The scale measures five key aspects of L2 writing: content, organization, vocabulary, language use, and mechanics, with each dimension scored from 0 to 20, allowing for a maximum total score of 100. The revised scale includes modified descriptors and equally weighted categories (see Appendix 2), demonstrating high reliability and validity (Connor-Linton & Polio, 2014). Two trained raters, both experienced ELT instructors, independently scored the essays. If the score discrepancy exceeded 15 points, another rater was consulted, and the final score was determined by averaging the three raters’ scores. The inter-rater reliability between the first two raters, calculated via the Pearson coefficient, was satisfactory (r = 0.84).
Descriptive statistics were generated for linguistic features obtained through manual coding of sentences and quantitative metrics from computational tools. The manual coding process emphasizes systematically identifying patterns of language use within specific discursive contexts through a functional analysis of textual construction. Given the study’s focus on examining linguistic usage and identifying patterns in students’ written outputs, Halliday’s (1994) systemic framework and functional grammar—addressing the structural dimensions of phrases, clauses, and sentences in English—provided the primary analytical lens for this investigation.
SPSS software version 27.0 was employed to obtain inferential statistics and check the assumptions of normality and multicollinearity. To find the predictive power of computationally derived linguistic features of EFL narrative essays in the human-rated evaluations of overall writing quality and the dimensions of vocabulary and language use, Pearson correlations were run followed by linear regression analyses. The same statistical procedures were used to find the comparative contributions of the human-rated and computational evaluations of syntactic complexity of EFL narrative essays to the human-rated assessments of writing quality.
Descriptive statistics for the scores on narrative writing quality and all the 18 manual and automated indices obtained for the linguistic features of EFL narrative essays are provided in Table 3. The correlations among all the variables are also given in Appendix 2. The normality of the distribution for the variables was determined based on skewness values ranging between − 2 and + 2 and kurtosis values within the range of − 7 to + 7 (see Table 3), as suggested by Hair et al. (2019).
Table 3. Descriptive statistics: scores of aspects of vocabulary, writing quality, and lexical features (n = 104)
M | SD | Skewness | Kurtosis | |
|---|---|---|---|---|
Narrative writing score | 68.44 | 10.08 | .12 | −.77 |
Simple sentences | 26.60 | 5.52 | .39 | −.22 |
Compound and complex sentences | 48.58 | 7.90 | −.00 | −.58 |
Unconventional sentences | 24.91 | 6.65 | .07 | −1.08 |
Sentence length | 18.03 | 4.93 | .45 | −.74 |
Sentence syntax similarity | .15 | .070 | .81 | .28 |
Left embeddedness | 2.96 | .80 | .05 | −.88 |
Number of modifiers | .72 | .07 | .45 | −.21 |
Measure of textual and lexical diversity | 87.94 | 16.45 | −.15 | −.93 |
Percentage of content words | 31.25 | 6.34 | −.12 | −1.14 |
Word frequency | 2.49 | .15 | .08 | −.70 |
Age of acquisition | 300.34 | 10.31 | .33 | .14 |
1000-word-frequency vocabulary | 88.96 | .89 | −1.83 | 5.96 |
2000-word-frequency vocabulary | 3.90 | 1.17 | −.03 | −.90 |
Academic word list | 1.16 | .62 | .52 | −1.24 |
Synonym overlap sentences (nouns) | .23 | .05 | .95 | 1.10 |
Synonym overlap sentences (verbs) | .39 | .09 | .20 | .05 |
Synonym overlap paragraphs (nouns) | 3.14 | .97 | .59 | −.04 |
Synonym overlap paragraphs (verbs) | 4.89 | 1.04 | .18 | −.31 |
Results
Contributions of linguistic features of text to human-judged writing quality
To examine the contribution of the computationally derived linguistic features of EFL narrative writing to human-rated evaluations of writing quality, Pearson correlation analyses were first conducted between scores on writing quality and all the 18 syntactic and lexical features (Appendix 2). The results indicated that the test-takers’ scores obtained for writing quality were positively/negatively and significantly correlated with compound and complex sentences (manual coding), unconventional sentences (manual coding), sentence length (automated index), MTLD, POC, AOA, WF, K2 words, and AWL words (obtained through computational tools). Therefore, in the regression analyses, the computationally derived linguistic features which had a significant correlation with the writing quality were included (Table 4). Multicollinearity was not detected for the predictor variables, as the tolerance values were above the 0.20 threshold, and the variance inflation factors (VIFs) were below 2.5 (Gordon, 2010; Tabachnick & Fidell, 2012).
Table 4. Linear regression analyses for contributions of computationally derived linguistic features to human-rated evaluation of writing quality
B | SE | ß | t | p | R2 | F(1, 102) | |
Model 1: syntactic features | .14 | 17.36*** | |||||
Constant | 54.38 | 3.49 | 15.55 | <.001 | |||
Sentence length | .78 | .18 | .38 | 4.16 | <.001 | ||
B | SE | ß | t | p | R2 | F(6, 97) | |
Model 2: lexical features | .83 | 84.12*** | |||||
Constant | −32.05 | 15.67 | −2.04 | .044 | |||
MTLD | .33 | .03 | .54*** | 8.82 | <.001 | ||
Percentage of content words | −.21 | .10 | −.13* | −2.03 | .045 | ||
Word frequency | 13.48 | 2.74 | .21*** | 4.91 | <.001 | ||
Age of acquisition | .10 | .04 | .10* | 2.16 | .033 | ||
K2 words | 3.14 | .51 | .36*** | 6.16 | <.001 | ||
AWL words | 1.49 | 1.55 | .09 | 1.41 | .160 | ||
B | SE | ß | t | p | R2 | F(7, 96) | |
Model 3: combined | .84 | 73.14*** | |||||
Constant | −32.32 | 15.59 | −2.07 | .041 | |||
Sentence length | −.16 | .11 | −.08 | −1.41 | .159 | ||
MTLD | .32 | .03 | .52*** | 8.28 | <.001 | ||
Percentage of content words | −.16 | .10 | −.10 | −1.52 | .131 | ||
Word frequency | 15.52 | 3.08 | .24*** | 5.03 | <.001 | ||
Age of acquisition | .09 | .04 | .09 | 1.92 | .057 | ||
K2 words | 3.33 | .52 | .38*** | 6.35 | <.001 | ||
AWL words | 1.69 | 1.06 | .10 | 1.59 | .114 |
n = 104
*p <.05
**p <.01
***p <.001
In regression analyses, the syntactic measure of sentence length was included in model 1, and six measures of lexical features (i.e., MTLD, POC, WF, AOA, K2 words, and AWL words) were added in model 2 (Table 4). According to model 1, computationally derived syntactic features significantly predicted human-rated assessment of narrative writing quality, F(1, 102) = 17.36, p < 0.001, r2 = 0.14. Syntactic complexity could explain 14% of the variance in the human-judged assessment of L2 narrative writing quality. In particular, sentence length (ß = 0.38, t = 4.16, p < 0.001) had a significant contribution. In model 2, the six combined measures of lexical features significantly predicted human-judged scores of writing quality, F(6, 97) = 84.12, p < 0.001, r2 = 0.83. Including features of lexical diversity, density, and sophistication, this model explained 83% of the variance in human judgements of academic writing quality. The inspection of standardized beta values further reveals that MTLD (ß = 0.54, t = 8.82, p < 0.001), word frequency (ß = 0.21, t = 4.91, p < 0.001), age of acquisition (ß = 0.10, t = 2.16, p < 0.05), and K2 words (ß = 0.36, t = 6.16, p < 0.001) uniquely contributed to the prediction of narrative writing quality. In model 3, the combined computationally derived syntactic and lexical features accounted for 84% of the variance in human-rated evaluation of L2 narrative writing quality, F(7, 96) = 73.14, p < 0.001, r2 = 0.84.
Contributions of human-rated and computational evaluations of syntactic complexity to writing quality
To find the comparative contributions of manual coding and computational evaluation of syntactic complexity to L2 narrative writing quality, Pearson correlation analyses were initially run to find the measures which were significantly correlated with writing quality (see Appendix 2). The three measures of compound and complex sentences (manual coding), unconventional sentences (manual coding), and sentence length (automated index) had significant correlations with the writing quality. These indices were used for the linear regression analyses. Multicollinearity was not an issue for the predictor variables, as the tolerance values were above the 0.20 threshold, and the VIFs were below 2.5 (Gordon, 2010; Tabachnick & Fidell, 2012).
The results indicated that human-rated assessment of syntactic complexity (model 1) significantly predicted narrative writing quality, F(2, 101) = 46.63, p < 0.001, r2 = 0.48. This method could explain 48% of the variance in the human-judged assessment of L2 narrative writing quality. Specifically, the use of unconventional sentences (ß = − 0.45, t = − 4.30, p < 0.001) negatively contributed to the prediction of narrative writing quality, while compound and complex sentences (ß = 0.29, t = 2.74, p < 0.01) positively contributed to the writing quality of the participants. Computational evaluation of syntactic complexity (model 2) had a lower, yet significant, predictive power, F(1, 102) = 17.36, p < 0.001, r2 = 0.14, indicating that the computational method explained 14% of the variance in human-judged scores of writing quality. The inspection of standardized beta values shows that sentence length (ß = 0.38, t = 4.16, p < 0.001) uniquely contributed to the prediction of narrative writing quality (Table 5).
Table 5. Linear regression analyses for comparative contributions of human-rated and computational evaluations of syntactic complexity to writing quality
B | SE | ß | t | p | R2 | F(2, 101) | |
Model 1: human-rated evaluation | .48 | 46.63*** | |||||
Constant | 67.61 | 9.84 | 6.86 | <.001 | |||
Compound & complex sentences | .36 | .13 | .29** | 2.74 | .007 | ||
Unconventional sentences | −.68 | .16 | −.45*** | −4.30 | <.001 | ||
B | SE | ß | t | p | R2 | F(1, 102) | |
Model 2: computational evaluation | .14 | 17.36*** | |||||
Constant | 54.38 | 3.49 | 15.55 | <.001 | |||
Sentence length | .78 | .18 | .38*** | 4.16 | <.001 |
B unstandardized beta, SE standard error, ß standardized beta
n = 104
** p <.01
***p <.001
Discussion
This study explored the alignment of linguistic features of narrative essays reported by computational tools with human-judged writing quality. Additionally, the comparative contributions of humans-rated and computational evaluations of syntactic complexity to writing quality were examined. The results indicated that both syntactic and lexical features were strong predictors of human-rated writing quality, with syntactic complexity, lexical diversity, and lexical sophistication as the strongest predictors. Human coding of syntactic complexity was found to be a stronger predictor of human-judged narrative writing quality than computational evaluation.
The relations between the computationally derived syntactic features of narrative essays and human-rated evaluation of writing quality revealed that sentence length was a significant predictor of narrative writing quality. From a systemic functional linguistics (SFL) perspective, this finding aligns with Halliday’s (1994) notion that sentence length and structural variety reflect the ability to manage and organize information effectively. As in previous studies (e.g., Lu, 2011), essays with longer and more varied sentences were rated higher by human evaluators, as these structures allow for more complex ideational and interpersonal functions, such as elaboration and elaborative relationships between clauses. SFL’s distinction between paratactic and hypotactic structures further supports this observation, with hypotactic (subordinate) clauses providing additional detail and nuance to ideas, enhancing the depth and clarity of the writing.
Moreover, this study’s results are consistent with broader academic writing trends, where syntactic sophistication plays a critical role in presenting complex arguments and analyses (Biber et al., 2011; Crossley & McNamara, 2014; Kyle & Crossley, 2018; Maamuujav et al., 2021). In SFL, the capacity to deploy complex syntactic structures reflects higher cognitive and linguistic development, enabling writers to construct more nuanced and coherent texts. Therefore, the findings contribute to the growing body of evidence suggesting that syntactic complexity, particularly in terms of sentence structure and clausal variety, is essential in shaping human-rated perceptions of writing quality.
Lexical diversity was also a significant predictor of human-rated writing quality. Research underscores that essays demonstrating higher lexical diversity tend to receive better scores because such features contribute to clarity, nuance, and complexity, aligning with evaluators’ expectations for high-quality writing. The results support Crossley and McNamara (2012) who found that learners with greater vocabulary knowledge produced essays with higher TTR, a common measure of lexical diversity. The results further support Crossley et al. (2014) and Woods et al. (2023) who found significant relations between measures of lexical diversity and writing performance.
The relationship between computationally derived lexical features of narrative essays and human-rated writing quality revealed that vocabulary from the 2000-word-frequency list and word frequency indices of lexical sophistication were the most significant contributors to human judgements of narrative writing quality. This finding aligns with existing research that underscores the importance of advanced vocabulary in academic writing. Lexical sophistication, typically characterized by the use of less frequent and more academic vocabulary, reflects a writer’s ability to employ nuanced and precise language, which has been shown to correlate with higher perceived quality in academic writing (e.g., Crossley & McNamara, 2012; Kim & Crossley, 2018; Maamuujav, 2021; Maamuujav et al., 2021). Human evaluators are often influenced by the presence of advanced vocabulary, as it signals linguistic proficiency, creativity, and depth of expression (Read, 2000). This emphasis on lexical sophistication is further supported by studies examining its predictive value for writing quality. For instance, Crossley et al. (2011) demonstrated that lexical sophistication was a significant predictor of writing performance across various genres.
The findings of this study reveal that human-coded measures of syntactic complexity were stronger predictors of writing quality than the computationally derived syntactic indices, emphasizing the complexity of syntactic features and their pivotal role in determining academic writing quality. This difference underscores the inherent limitations of computational tools in capturing the nuanced nature of syntactic complexity. While computational indices such as sentence length or frequency of subordinate clauses are valuable, they fail to fully capture the functional and rhetorical roles that syntactic structures play in a text.
Human evaluators, in contrast, are able to interpret syntactic structures within the broader context of writing, considering their effectiveness in enhancing clarity, cohesion, and argumentation. Norris and Ortega (2009) argue that syntactic complexity should not be evaluated solely based on structural frequency or length, but also on how well these structures serve the text’s overall communicative purpose. This ability to assess context-dependent aspects, such as the appropriateness, variety, and rhetorical effectiveness of syntactic choices, positions human coders as better equipped to evaluate writing quality comprehensively.
Computational tools, while efficient for large-scale analyses, often rely on predefined indices that measure surface-level features such as mean sentence length or the number of subordinate clauses. These tools, however, may overlook strategic uses of syntactic structures, such as emphasizing key points, introducing evidence, or creating logical transitions. These elements are especially crucial in academic writing, where syntactic complexity can significantly enhance persuasiveness and readability. Lu (2010) noted that computational indices often focus on the frequency of structures rather than how they contribute to meaning and coherence. Crossley and McNamara (2014) further emphasized that human judgments of writing quality tend to consider a broader range of linguistic features, including cohesion and rhetorical effectiveness—dimensions often underrepresented in computational analyses.
Ultimately, these findings highlight the alignment between human evaluators’ holistic approach to syntactic complexity and the limitations of computational tools that focus on surface-level indices. While computational indices provide valuable insights, they cannot fully replace human judgment, particularly when it comes to assessing the contextual relevance and rhetorical effectiveness of syntactic choices in academic writing. Thus, this study affirms the need for a more integrated approach that combines computational tools with human judgment to achieve a more comprehensive assessment of writing quality.
Conclusion
This study presents several important pedagogical implications for educators, writing task designers, and researchers. It is recommended that teachers design writing tasks that promote the use of complex syntactic structures and advanced vocabulary. For instance, teachers could ask students to revise simple sentences by incorporating more complex grammar and a wider range of vocabulary, or to craft narratives utilizing compound and complex sentences alongside varied and sophisticated word choices. Additionally, educators can employ tools such as VocabProfiler and Coh-Metrix to provide students with valuable insights into the lexical sophistication and diversity of their writing. These tools can offer detailed, objective feedback on lexical features, enabling teachers to tailor their instruction to address specific areas for improvement (McNamara et al., 2014).
Test designers can incorporate computational measures of lexical sophistication and diversity into scoring rubrics, enhancing the objectivity and consistency of writing assessments. By integrating such tools, a more consistent and reliable framework for evaluation can be established, ensuring that evaluations are less subject to the variability of human raters (McNamara et al., 2014; Crossley et al., 2012). Computational tools like Coh-Metrix can be utilized to automate parts of the evaluation process, providing quick and accurate quantification of lexical features, thereby alleviating the workload of human raters while preserving the integrity of the assessment (Attali & Burstein, 2004). These lexical metrics can serve not only as scoring instruments but also as a valuable source of diagnostic feedback for students. This feedback can help learners identify specific areas of strength and weakness in their lexical usage, ultimately guiding them toward more sophisticated and precise language use in their writing.
The lack of a significant correlation between lexical cohesion and human-rated writing quality highlights a potential imbalance in current writing assessment rubrics, which often place disproportionate emphasis on lexical sophistication and diversity. These rubrics tend to prioritize the use of less frequent and more varied vocabulary, sometimes overlooking the importance of lexical cohesion—evident through word associations and collocations—that reflects a deeper understanding and command of vocabulary. This finding suggests that writing assessment frameworks could benefit from a more holistic approach to evaluating lexical knowledge. Such an approach would enable test designers and raters to account for the crucial role of lexical cohesion in effective communication. To this end, raters should be trained to assess writing quality in a balanced manner, considering not only lexical and syntactic complexity but also cohesion, coherence, and rhetorical effectiveness. By broadening the scope of assessment criteria, evaluators can provide a more accurate and comprehensive measure of writing proficiency, rather than focusing predominantly on a limited set of linguistic features (Biber et al., 2011).
For L2 researchers, this study underscores both the potential and limitations of computational tools in writing research. While tools like Coh-Metrix, VocabProfiler, and TAACO provide valuable insights into various linguistic features, researchers must recognize that these tools alone cannot fully capture the complexity of writing quality. The stronger predictive power of human coding in this study highlights the necessity of combining computational analyses with manual coding to obtain a more comprehensive evaluation of writing. This finding reinforces the importance of adopting a multidimensional approach to evaluating syntactic complexity—one that integrates the efficiency of computational tools with the nuanced understanding and expertise that human raters bring to the assessment process.
This study has several limitations that should be considered. First, the sample was drawn from two universities using convenience sampling, which limits the generalizability of the findings. Since undergraduate EFL students are a highly diverse group with varying linguistic needs and educational backgrounds, using larger, more representative samples obtained through probability sampling methods would better capture this diversity and provide findings that are more applicable to the broader EFL population. Although previous studies have employed relatively small sample sizes (e.g., Crossley et al., 2016; Maamuujav, 2021; Maamuujav et al., 2021), the integration of computational tools holds the potential to facilitate the analysis of larger datasets in future research on EFL writing. Second, the study employed only a single narrative writing prompt, which may have introduced the influence of topic familiarity. Future research could address this limitation by incorporating multiple writing prompts or comparing the linguistic features of texts across various academic genres, such as argumentative and narrative writing, to examine how different tasks may affect writing quality and the contribution of specific linguistic features.
Human ethics and consent to participate
Not applicable.
Author’s contributions
M.J.E. did all the steps of this research and wrote the manuscript.
Funding
No funding was received.
Data availability
No datasets were generated or analysed during the current study.
Declarations
Ethics approval and consent to participate
All procedures performed in the study were in accordance with the 1964 Helsinki declaration and its later amendments and were approved by the Research Ethics Review Committee of the University of Maragheh, Maragheh, East Azerbaijan, Iran. Informed consent was obtained from all the individual participants included in this study, and they were assured of the confidentiality of their responses.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Abbreviations
English as a Foreign Language
Tool for the automatic analysis of cohesion
Second language acquisition
Measure of textual and lexical diversity
Type-to-token ratio
English as a Second Language
Lexical frequency profile
Test of English as a Foreign Language
Internet-based test
Syntactic complexity analyzer
English as a Second Language
Academic Word List
Word frequency
Age of acquisition
Statistical Package for Social Sciences
Percentage of content words
Mean
Standard deviation
Standard error
Variance inflation factor
Systemic functional linguistic
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Abdi Tabari, M; Johnson, MD. Exploring new insights into the role of cohesive devices in written academic genres. Assessing Writing; 2023; 57, [DOI: https://dx.doi.org/10.1016/J.ASW.2023.100749] 100749.
Attali, Y., & Burstein, J. (2004). Automated essay scoring with e‐rater® v.2.0. ETS Research Report Series, 2004(2). https://doi.org/10.1002/j.2333-8504.2004.tb01972.x
Biber, D; Conrad, S. Register, genre, and style; 2019; Cambridge University Press: [DOI: https://dx.doi.org/10.1017/9781108686136]
Biber, D., & Gray, B. (2016). Grammatical complexity in Academic English: Linguistic change in writing. Cambridge: Cambridge University Press.
Biber, D; Gray, B; Poonpon, K. Should we use characteristics of conversation to measure grammatical complexity in L2 writing development?. TESOL Quarterly; 2011; 45,
Casal, JE; Lee, JJ. Syntactic complexity and writing quality in assessed first-year L2 writing. Journal of Second Language Writing; 2019; 44, pp. 51-62. [DOI: https://dx.doi.org/10.1016/j.jslw.2019.03.005]
Cobb, T. (2018). Compleat lexical tutor. Retrieved from http://www.lextutor.ca.
Connor-Linton, J; Polio, C. Comparing perspectives on L2 writing: Multiple analyses of a common corpus. Journal of Second Language Writing; 2014; 26, pp. 1-9. [DOI: https://dx.doi.org/10.1016/j.jslw.2014.09.002]
Crossley, SA; Kyle, K; Dascalu, M. The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap. Behavior Research Methods; 2019; 51,
Crossley, SA; Kyle, K; McNamara, DS. The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion. Behavior Research Methods; 2015; 48,
Crossley, SA; Kyle, K; McNamara, DS. The development and use of cohesive devices in L2 writing and their relations to judgments of essay quality. Journal of Second Language Writing; 2016; 32, pp. 1-16. [DOI: https://dx.doi.org/10.1016/j.jslw.2016.01.003]
Crossley, S.A. & McNamara, D.S. (2010). Cohesion, coherence, and expert evaluations of writing proficiency. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society (pp. 984–989). Cognitive Science Society.
Crossley, SA; McNamara, DS. Predicting second language writing proficiency: The roles of cohesion and linguistic sophistication. Journal of Research in Reading; 2012; 35,
Crossley, SA; McNamara, DS. Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing; 2014; 26, pp. 66-79. [DOI: https://dx.doi.org/10.1016/j.jslw.2014.09.006]
Crossley, SA; Salsbury, T; McNamara, DS. Assessing lexical proficiency using analytic ratings: A case for collocation accuracy. Applied Linguistics; 2014; 36, pp. 570-590.
Crossley, SA; Salsbury, T; McNamara, DS; Jarvis, S. Predicting lexical proficiency in language learner texts using computational indices. Language Testing; 2011; 28,
Dowell, NM; Graesser, AC; Cai, Z. Language and discourse analysis with Coh-Metrix: Applications from educational material to learning environments at scale. Journal of Learning Analytics; 2016; 3,
Drijbooms, E; Groen, MA; Verhoeven, L. How executive functions predict development in syntactic complexity of narrative writing in the upper elementary grades. Reading and Writing; 2017; 30,
East, M. (2009). Evaluating the reliability of a detailed analytic scoring rubric for foreign language writing. Assessing Writing, 14(2), 88–115. https://doi.org/10.1016/j.asw.2009.04.001
Gordon, R. Regression analysis for the social sciences; 2010; Routledge:
Halliday, M. A. K. (1994). Introduction to functional grammar (2nd ed.). Edward Arnold.
Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate data analysis (8th ed.). Hampshire: Cengage Learning EMEA.
Hao, Y; Jin, Z; Yang, Q; Wang, X; Liu, H. To predict L2 writing quality using lexical richness indices: An investigation of learners of Chinese as a Foreign Language. System; 2023; 118, [DOI: https://dx.doi.org/10.1016/j.system.2023.103123] 103123.
Hou, J; Verspoor, M; Loerts, H. An exploratory study into the dynamics of Chinese L2 writing development. Dutch Journal of Applied Linguistics; 2016; 5, pp. 65-96. [DOI: https://dx.doi.org/10.1075/dujal.5.1.04loe]
Hyland, K. (2009). Academic discourse: English in a global context. Continuum.
Jacobs, H., Zinkgraf, S., Wormuth, D., Harfiel, V., & Hughey, J. (1981). Testing ESL composition: A practical approach. Newbury House.
Jarvis, S. Capturing the diversity in lexical diversity. Language Learning; 2013; 63,
Jiang, J; Bi, P; Liu, H. Syntactic complexity development in the writings of EFL learners: Insights from a dependency syntactically-annotated corpus. Journal of Second Language Writing; 2019; 46, pp. 1-13. [DOI: https://dx.doi.org/10.1016/j.jslw.2019.100666]
Kim, J-Y. Predicting L2 writing proficiency using linguistic complexity measures: A corpus-based study. English Teaching; 2014; 69,
Kim, M; Crossley, SA. Modeling second language writing quality: A structural equation investigation of lexical, syntactic, and cohesive features in source-based and independent writing. Assessing Writing; 2018; 37, pp. 39-56. [DOI: https://dx.doi.org/10.1016/J.ASW.2018.03.002]
Kyle, K; Crossley, SA. Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly; 2015; 49,
Kyle, K; Crossley, S. Assessing syntactic sophistication in L2 writing: A usage-based approach. Language Testing; 2017; 34,
Kyle, K; Crossley, SA. Measuring syntactic complexity in L2 writing using fine- grained clausal and phrasal indices. Modern Language Journal; 2018; 102, pp. 333-349. [DOI: https://dx.doi.org/10.1111/modl.12468]
Labov, W; Waletzky, J. Helm, J. Narrative analysis: Oral versions of personal experience. Essays on the Verbal and Visual Arts; 1967; University of Washington Press: pp. 12-44.
Landauer, T; McNamara, DS; Dennis, S; Kintsch, W. Handbook of latent semantic analysis; 2007; Mahwah:
Larsen-Freeman, D. The emergence of complexity, fluency, and accuracy in the oral and written production of five Chinese learners of English. Applied Linguistics; 2006; 27,
Laufer, B; Nation, P. Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics; 1995; 16,
Lu, X. Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics; 2010; 15,
Lu, X. A Corpus-Based evaluation of syntactic complexity measures as indices of College-Level ESL writers’ language development. TESOL Quarterly; 2011; 45,
Maamuujav, U. Examining lexical features and academic vocabulary use in adolescent L2 students’ text-based analytical essays. Assessing Writing; 2021; 49, [DOI: https://dx.doi.org/10.1016/j.asw.2021.100540] 100540.
Maamuujav, U; Olson, CB; Chung, H. Syntactic and lexical features of adolescent L2 students’ academic writing. Journal of Second Language Writing; 2021; 53, [DOI: https://dx.doi.org/10.1016/j.jslw.2021.100822] 100822.
Manchón, RM. Manchón, RM. Writing to learn the language: Issues in theory and research. Learning-to-write and writing-to-learn in an additional language; 2011; John Benjamins: pp. 61-82. [DOI: https://dx.doi.org/10.1075/lllt.31.07man]
McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392. https://doi.org/10.3758/brm.42.2.381
McNamara, DS; Graesser, AC; McCarthy, PM; Cai, Z. Automated evaluation of text and discourse with Coh-Metrix; 2014; Cambridge University Press: [DOI: https://dx.doi.org/10.1017/CBO9780511894664]
Norris, JM; Ortega, L. Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics; 2009; 30,
Ortega, L. Syntactic complexity in L2 writing: Progress and expansion. Journal of Second Language Writing; 2015; 29, pp. 82-94. [DOI: https://dx.doi.org/10.1016/j.jslw.2015.06.008]
Oshima, A., & Hogue, A. (2013). Longman academic writing series 3: Paragraphs to essays (4th ed.). Pearson.
Read, J. Assessing vocabulary; 2000; Cambridge University Press: [DOI: https://dx.doi.org/10.1017/CBO9780511732942]
Sun, L; Nippold, MA. Narrative writing in children and adolescents: Examining the literate lexicon. Language, Speech, and Hearing Services in Schools; 2012; 43,
Tabachnick, B. G., & Fidell, L. S. (2012). Using multivariate statistics (6th ed.). Allyn and Bacon.
Tracy-Ventura, N. Combining corpora and experimental data to investigate language learning during residence abroad: A study of lexical sophistication. System; 2017; 71, pp. 35-45. [DOI: https://dx.doi.org/10.1016/j.system.2017.09.022]
Uccelli, P; Dobbs, CL; Scott, J. Mastering academic language: Organization and stance in the persuasive writing of high school students. Written Communication; 2012; 30, pp. 36-62. [DOI: https://dx.doi.org/10.1177/0741088312469013]
Weigle, SC. Assessing writing; 2002; Cambridge University Press: [DOI: https://dx.doi.org/10.1017/CBO9780511732997]
Woods, K; Hashimoto, B; Brown, EK. A multi-measure approach for lexical diversity in writing assessments: Considerations in measurement and timing. Assessing Writing; 2023; 55, [DOI: https://dx.doi.org/10.1016/j.asw.2022.100688] 100688.
Xu, Y; Casal, JE. Navigating complexity in plain English: A longitudinal analysis of syntactic and lexical complexity development in L2 legal writing. Journal of Second Language Writing; 2023; 62, [DOI: https://dx.doi.org/10.1016/j.jslw.2023.101059] 101059.
Yang, Y; Yap, NT; Ali, AM. Predicting EFL expository writing quality with measures of lexical richness. Assessing Writing; 2023; 57, [DOI: https://dx.doi.org/10.1016/j.asw.2023.100762] 100762.
Yasuda, S. Does “more complexity” equal “better writing”? Investigating the relationship between form-based complexity and meaning-based complexity in high school EFL learners’ argumentative writing. Assessing Writing; 2024; 61, [DOI: https://dx.doi.org/10.1016/j.asw.2024.100867] 100867.
Yoon, H; Polio, C. The linguistic development of students of English as a second language in two written genres. TESOL Quarterly; 2017; 51, pp. 275-301. [DOI: https://dx.doi.org/10.1002/tesq.296]
Copyright Springer Nature B.V. Dec 2025