Content area
Standard methods for assessing English as a Foreign Language (EFL) writing often prioritize top-down criteria that may overlook subtle patterns and styles employed by different learners. We demonstrate a systematic and replicable three-phase approach to capture EFL writing styles using AI and data analytics, highlighting attendant insights and implications. Our approach involves (i) training a deep learning Variational Autoencoder to extract latent stylistic ‘fingerprints’ that are independent of the topic and content of writing, (ii) cluster analysis to identify emergent clusters as styles based on these fingerprints, and (iii) qualitative analysis of exemplars to interpret the styles. A case study application to EFL essays across argumentative, narrative, and reflective genres (N = 892) revealed four writing styles: 'error-prone ambition' (characterized by performative displays of advanced but contextually inappropriate vocabulary), 'striking a balance' (a more controlled approach at balancing vocabulary and grammar), 'safe and secure' (prioritizing clarity with minimal risks), and 'inconsistent expression' (deliberate or inadvertent inconsistency in stylistic choices). The clusters defining these styles have different sizes and variances, with overlapping essays hinting at underexplored developmental trajectories in EFL writing. They affirm a broader perspective where writing styles reflect not only linguistic features but also learning attitudes and strategies. Furthermore, our approach further minimizes emphasis on lexical and grammatical errors, focusing instead on stylistic regularities that can facilitate less punitive, and more nuanced and affirming ways to teach EFL writing.
Introduction
In an EFL context, ‘writing style’ refers to the way a writer expresses their thoughts, which may be influenced by various conventions (e.g. linguistic, discursive, registral, cultural) as well as individual dispositions (Hyland, 2003). Acknowledging writing styles is a critical aspect of language education, yet traditional pedagogical and assessment models often fall short (Ferris & Hedgcock, 2013; Hyland, 2007). For example, rubric-driven assessment—be they formative or summative in nature—tends to emphasize adherence to prescribed conventions (Hamp-Lyons, 2007; Weigle, 2002), and overlook alternative articulations (Matsuda, 2001; Silva, 1993) that may provide subtle yet valuable insights into writers’ learning journey (Ferris & Hedgcock, 2013). This may lead to incomplete and myopic evaluation of how people learn to express themselves in a new language.
The case of Chinese, or more broadly, Asian EFL learners provides a salient example of these issues (Hu, 2002; Liu & Braine, 2005). These students often grapple with linguistic interference from their first language, cultural influences and expectations that shape their approach to writing, and the need to adapt to English academic norms that may differ from their prior educational experiences (Connor, 2002; Hu, 2002). For instance, Chinese students might structure arguments or transfer syntactic and rhetorical structures from their native language, which could be misinterpreted or penalized by standard rubrics that are often based on native English speaker norms. Furthermore, the educational background of many Chinese tertiary learners often emphasizes information presentation over personal analysis. The former refers to memorization and the transmission of factual knowledge, while the latter, typical of broadly ‘Western’ academic norms, is characterized by critical argumentation skills, personal interpretation, and the expression of unique insights (Gu & and Schweisfurth, 2006). This can influence their approach to academic writing in English, while the pressure to conform to Western academic writing conventions might also inadvertently suppress the development of a unique and authentic writing style in these learners (Canagarajah, 2006; Matsuda, 2001). Consequently, traditional assessment methods may miss the nuances of their developing English literacy and their varied ways of making meaning through writing. This is consistent with research suggesting that rubrics may not always lead to valid judgments, can stifle students' creativity by focusing on specific criteria, and might be interpreted differently by teachers and students (Panadero & Jonsson, 2013; Sadler, 2009; Torrance, 2007). There is thus a compelling need for alternative data-driven approaches to capture the richness and diversity of student writing styles, their potential relationships with broader notions like learning styles and strategies (Cohen, 2011; Reid, 1995), and attendant pedagogical implications.
Contemporary AI and data analytic techniques offer promising ways to address these concerns (Tay, 2024; Wongvorachan & Bulut, 2025; Zhai et al., 2020). For example, deep learning algorithms can reveal latent stylistic ‘fingerprints’ in a large corpus of essays that go beyond surface-level features like word choices (Manning et al., 2020), capturing stable writing styles that are independent of content. The subtle feature combinations underpinning these styles are usually difficult for human readers using traditional analytical methods to uncover in systematic and reliable fashion. Variational Autoencoders (VAEs), a type of generative deep learning model, are well-suited for this task (Blei et al., 2017; Bowman et al., 2016; Kingma & Welling, 2022). Unlike traditional autoencoders that encode a fixed representation, or other common approaches like bag-of-words quantification schemes and word embeddings (Tay, 2020, 2021), VAEs are trained to learn a probabilistic distribution over a so-called ‘latent space’ in text data. This allows for the capture of more nuanced and varied representations of writing, potentially distinguishing between subtle stylistic differences. Furthermore, its probabilistic nature enables the modelling of inherent variability in human writing, with a range of potential stylistic expressions rather than a single, fixed representation for each text.
Figure 1 is a schematic diagram of a Variational Autoencoder adapted for text analysis. The process begins with the input text(s) X, which undergo an initial transformation into a suitable numerical representation such as word embeddings (Goldberg, 2016). This numerical representation is then fed into an encoder network for further processing. The encoder's function is to distill the text into two crucial parameters, namely the mean (μ) and variance (σ2), of a probabilistic distribution within a lower-dimensional latent space. A ‘reparameterization trick’ in then applied, which combines the parameters with random noise (z = μ + σ⊙ϵ) to sample a latent vector z. This latent vector can be seen as a compressed representation of the original input that captures important stylistic or thematic elements in a compact form.
[See PDF for image]
Fig. 1
Schematic VAE diagram
Consider an analogy for this. Imagine adjusting a sprinkler system that waters a garden. Instead of directly trying to control each individual water droplet's random landing spot (i.e., each text’s properties), you adjust two main knobs: one for the centre point where you want the water to land (i.e., the mean of the numerical forms), and another for the water’s overall spread (variance of the numerical forms). The reparameterization trick adds a separate, standardized random ‘wobble’, making it much easier to learn how to adjust those two main knobs effectively to water your garden just right. The sampled latent vector is like the unique pattern of water droplets that falls for your adjustment of the center and spread knobs, influenced by the added random wobble. It is a unique "fingerprint" of how the sprinkler is set for that moment, capturing its essence in a compact but non-deterministic way.
A decoder then attempts to reconstruct the original input or generate new text X′ from the latent vector, with the objective of matching the original as closely as possible. In essence, this architecture allows the VAE to learn essential, yet nuanced patterns by representing text as probabilistic distributions rather than single fixed points. While VAEs are commonly utilized to train generative AI models to create new texts/images using the decoder, the present application does not generate new essays. Instead, it aims to extract and identify stylistic 'fingerprints' from extensive samples of student writing, hence stopping short of the decoder stage.
Once these fingerprints are captured, cluster analysis could be applied to identify similar writing styles shared by writers. Cluster analysis is a machine learning technique for grouping objects in such a way that those in the same group (called a cluster) are more like one another than to those in other groups (clusters). At the same time, it also identifies the optimal number of groups that describe the dataset. These groups are not pre-defined but ‘emerge’ naturally from the data, making it an excellent tool for exploratory analysis. Discourse analysts have used it to identify, among other things, (dis)similarities within and between speakers in psychotherapy and other contexts (Tay, 2017, 2020; Tay & Qiu, 2022). By applying it to identify clusters of students with similar writing styles, teachers can gain a rigorous data-driven understanding of their diversity and nature, as opposed to a one-size-fits-all approach to instruction and assessment. The above quantitative analyses can be further complemented by qualitative analysis of essays within each identified cluster. This will provide a deeper understanding of the specific linguistic features, thematic choices, and organizational patterns that define each emergent writing style. Additional insights could be gained by analyzing essays that prototypically exemplify each style (i.e. those that occupy central positions in each cluster), those that lie at the periphery, as well as those that lie in overlapping regions between clusters.
The three-phase mixed-method approach outlined above constitutes a novel data-driven and style-focused contribution to EFL writing research. It has several potential pedagogical benefits. Firstly, it can help educators hear student voices by uncovering their unique ways of expressing themselves, potentially revealing insights into their cultural backgrounds, individual learning processes, and perspectives that might be obscured by sole adherence to prescribed conventions. Relatedly, it can democratize the learning environment by facilitating assessment methods beyond predefined criteria and standardized rubrics that might inadvertently privilege or penalize certain writing styles. The idea of identifying and ‘mapping’ fellow students with similar styles may also stimulate creative avenues for them to understand and learn from one another, beyond standard ways of forming student groups. It is crucial to highlight that although implementing the approach may require initial basic training in NLP and deep learning, the skills acquired becomes a replicable toolkit for ‘scholar-practitioners’ (Whong, 2023) aspiring to interface their frontline professional practice with practical scholarly research. This paper aims to (partly) provide the training with a stepwise demonstration of the approach on a corpus of EFL writing in the tertiary Chinese context. Our main objectives are to i) evaluate its feasibility and insightfulness in capturing latent student writing styles, and ii) demonstrate the practical utility and pedagogical value of this approach. Interested readers can visit https://github.com/dennistay1981/Resources for working Python code.
Method
Participants and dataset
The participants were first-year, non-English major associate degree students, aged between 19 and 20 years old. Their listening and reading skills were equivalent to Level 3 of the China Standards of English Language Ability (CSE) while their speaking and writing abilities corresponded to Level 2 of the CSE. The dataset comprises 892 essays of 500–700 words each, collected from two Chinese universities. Informed consent to participate in the study was sought (in accordance with the Declaration of Helsinki) although the specific details of the study were not revealed to students. The essays were written as in-class assessment items without external assistance and a time limit of 40 min. This ensures consistency in the writing process and reflects typical assessment conditions. To ensure a diverse and naturalistic sample of writing, a range of genres and essay topics are represented with examples like ‘the differences between the education systems of China and other countries’, ‘the challenges of global warming’, ‘the impact of IT development on my studies’, ‘an unforgettable trip’, and ‘a letter to myself’.
Phase 1: VAE training
The VAE aims to learn latent representations that capture stylistic features independent of content. This consists of 50 numerical dimensions/fingerprints that compress the most discriminative features of the underlying writing styles, effectively filtering out less important variations. The dataset was first pre-processed with Hugging Face’s BERT tokenizer and pretrained BERT model embeddings, which serve as inputs to the encoder (Fig. 1). Unlike simpler word embeddings that capture word-level relationships, BERT's pre-trained deep bidirectional transformers are particularly effective at capturing the nuanced contextual meaning of words within a sentence (Devlin et al., 2019). This is crucial for our objective of extracting latent stylistic fingerprints that go beyond surface-level features. Other technical details are outlined in the footnote.1
Phase 2: Cluster analysis
Phase 1 resulted in each essay being represented by 50 numerical dimensions. In other words, each occupies a position in a 50-dimensional space, with their relative proximity indicating similarity in writing style. It is crucial to note that there is no one-to-one correspondence between these numerical dimensions and concrete linguistic features. The former are abstract representations that capture complex, often non-linear, combinations of linguistic patterns that define a style. As is typical for deep learning representations, they are high-level, emergent properties rather than directly traceable features.
In Phase 2, a Gaussian Mixture Model (GMM) was applied to assign each essay to a cluster in this space. GMM was preferred over other clustering algorithms like k-means and hierarchical clustering as it uses probabilistic rather than absolute membership assignment, which can facilitate further uncertainty quantification and theorizing of ‘overlapping’ styles. The optimal number of clusters / writing styles was determined by the Bayesian Information Criterion (BIC) (Chakrabarti & Ghosh, 2011), a standard criterion for model selection that aims to balance between fitting the data well and keeping the model simple. Lower BIC values are generally better, suggesting a model that explains the data well without being unnecessarily complicated. The optimal model was then further evaluated with a between-clusters comparison of feature scores.
Phase 3: Qualitative analysis
Lastly, to gain a deeper understanding of the linguistic characteristics that define each of the identified writing styles, a sample of essays was selected from within each cluster for qualitative analysis. Special attention was paid to essays at the cluster centroids. These are the spatial co-ordinates representing the mean scores of each dimension among all members of a cluster. Conceptually, cluster centroids therefore mark the location of prototypical or ‘model’ essays. The analysis involved a close reading of the essays for elements including vocabulary choices, sentence structures, grammatical fluency, and the overall tone of the writing. This is necessary to interpret the earlier machine-driven phases for a comprehensive understanding of each style. As explained above, the VAE provided the numerical "fingerprints" (Phase 1), and the cluster analysis grouped essays based on these (Phase 2). Concrete linguistic elements are not direct discriminating criteria for the VAE itself, but are humanly identified and interpreted in Phase 3 after the computational processes have grouped them based on the latent 50 dimensions.
Results and discussion
Clustering solution
The optimal number of clusters was determined to be four, based on its lowest BIC score (17,887.82) among alternatives. Figure 2 is a scatterplot showing the distribution of essays among the four emergent writing styles A-D. There are 341 essays falling into Style A, 257 in Style B, 187 in Style C, and 107 in Style D. The 2D visualization was derived by reducing the original 50 dimensions into two principal components (PC1, PC2) using Principle Components Analysis (PCA), since direct visualization of 50 dimensions is impossible.
[See PDF for image]
Fig. 2
Distribution of emergent writing styles
In other words, PC1 and PC2 are orthogonal axes in a 2D space that retain the maximum amount of variance (information) from the original 50 dimensions with some inevitable information loss. While they don't have direct linguistic interpretations (e.g. "PC1 represents grammar"), they represent the primary directions of stylistic variation within the dataset. They retained 89.0% of the variance of the original 50 dimensions (PC1 = 66.38%, PC2 = 22.62%), indicating minimal information loss. The black crosses indicate cluster centroids, which as explained above mark the location where model essays are likely to be.
To further evaluate the quality of this clustering solution, the mean values of principal components 1 and 2 were plotted (Fig. 3) and statistically compared among the four clusters. Significant inter-cluster differences would imply that the cluster centroids have distinct positions in the representation space, which in turn implies a good clustering solution where the four writing styles are sufficiently unique. A MANOVA with Pillai’s trace revealed a significant effect of clusters (A to D) on the combined dependent variables (PC1 and PC2), F(6, 888) = 444.69, p < .001. Follow-up one-way ANOVAs reveal all pairwise-differences among the clusters to be significant, p < .001.
[See PDF for image]
Fig. 3
Comparison of mean dimension scores between clusters
Having verified the clustering outcome, we can now proceed to identify model essays closest to each centroid for manual analysis of their linguistic features (Phase 3). Each style will be exemplified through analyses of the two essays nearest the centroids. The discussion will be supported by other key observations from Fig. 2. The first observation is a direct correlation between the size of a cluster (i.e. number of essays defining each style) and how close the datapoints are to one another within (how similar the essays are within each style). The largest style A (N = 341) is also where the essays are closest to one another, while the smallest style D (N = 107) exhibits greatest inter-essay variance and more outlying essays. Styles B (N = 257) and C (N = 187) are in between. Relatedly, Styles B, C and D have greater inter-cluster overlap where essays are reasonably close to more than one centroid and could thus exhibit the salient characteristics of more than one style. These observations suggest that the most common style is also more stable and clearly defined, while less common ones could be characterized by more variability and experimentation. Both conceptually and empirically, the styles are therefore not discrete categories but may co-manifest to different extents.
Before elaborating on these observations, we reiterate that the linguistic features to be described below are not the direct discriminative features that the VAE used to categorize essays, and there is no one-to-one correspondence between the features and the VAE’s 50 dimensions. Instead, the VAE, by learning the complex probabilistic distribution of text data, identifies patterns that correlate with humanly perceived features like grammatical accuracy, sentence complexity, or vocabulary appropriateness. The qualitative analysis in Phase 3 is the interpretive step where we, as human researchers, examine essays from each cluster and identify these observable linguistic characteristics that define the emergent styles. This process is crucial for bridging the quantitative machine-driven analysis with qualitative human interpretation.
Style A: Error-prone ambition
Style A is the largest and most compact cluster. It is generally marked by long and often complex sentence structures and advanced vocabulary, suggesting an attempt to showcase lexical knowledge. However, such vocabulary is not always contextually appropriate, and the writing tends to contain basic grammatical errors. Consider an introductory paragraph in Example 1. Expressions of interest are underlined in this and all subsequent examples.
Example 1
In the education industry, a joke is circulating among parents in Beijing: If a kid scores 99 out of 100 on an exam, American parents will certainly praise it, whereas Chinese parents would not satisfy and always complain about the one missing point. This phenomenon is the epitome of tiger parenting distinctions between the western and the eastern. Overall, children can achieve better results under Chinese parenthood at the expense of happiness, whilst their American counterparts can obtain more autonomy from their parents. This article will critically discuss both parenting methods and incorporate the essence of the two methods to propose a balanced parenting pattern that delivers all-round cultivation for children.
Responding to an argumentative question on the differences between the education systems of China and America, the writer aptly attempts in this introductory paragraph to outline key concepts like tiger parenting for subsequent elaboration. They express this structural convention with advanced and/or low frequency vocabulary like ‘phenomenon’, ‘epitome’, and ‘whilst’, and the use of ‘article’ to describe an essay seems to be a deliberate stylistic appropriation of a formal academic genre. The sentences are relatively long, and the final sentence outlines an (overly) ambitious content objective that is consistent with the essay’s vocabulary and stylistic choices. At the same time, however, the paragraph suffers from a range of grammatical and other errors such as verb form (‘would not satisfy’), incomplete nominal constructions (‘the western and the eastern’), and awkward phrasing (‘a joke is circulating’, ‘the epitome of tiger parenting distinctions’, ‘all-round cultivation’). In particular, ‘all-round cultivation’ is likely to be a direct translation of the common Chinese phrase 全面培养. To the extent that such direct L1 influences co-occur with the other elements above, the analysis further supports a cluster-of-errors approach to evaluating FL writing where linguistic, stylistic, attitudinal, and cultural elements all converge.
Example 2, from a narrative essay, demonstrates that the ‘error-prone ambition’ and hyper-formality that characterizes Style A is not limited to argumentative writing.
Example 2
The morning sunshine was bright and nice, entering gently through my window. I woke up quickly because I had overwhelming elation about the trip today. It was planned by our school to visit a historical museum. Many classmates also enlisted, and I prepared some snacks and a notebook to write down important knowledge…
Then, the guide began explaining the history of the items, and I tried to listen carefully. However, there were many facts and dates, so it was a little hard to concentrate. Meanwhile, my friend started talking to me about a video he watched on social media. Even though it was funny and interesting, I realized it was distracting me from the important explanation. I tried to return my attention immediately, knowing that my self-discipline was important. Afterwards, we had some free time to explore the museum independently. I chose to visit a special exhibition room that had technology from ancient times. Seeing how much science and technology had changed over the years, I felt amazed and thought deeply about the human advancement.
Finally, we gathered back onto the bus to return home. Overall, the trip brought both happiness and knowledge to us, despite some distractions which I overcame with discipline. I learned that staying focused is significant to fully understand educational experiences. Thus, the historical museum trip was beneficial in a material and spiritual way.
Lexical choices like ‘overwhelming elation’ and ‘enlisted’ echo Example 1 in suggesting that the writers’ advanced vocabulary is undermined by a lack of contextual appropriateness, pragmatic awareness, and collocational knowledge. The sentence structures are likewise fairly complex and prone to grammatical errors like voice (‘It was planned by our school’), awkward constructions (‘the trip brought both happiness and knowledge to us’, ‘in a material and spiritual way’), and causal markers (‘thus’) more typical of academic genres. The similar stylistic elements between the two examples support the idea of latent writing styles being independent of content, and structurally salient enough to be captured by VAE. Lastly, the compactness of Style A suggests that writers exhibiting this style share a relatively consistent approach to writing, characterized by a strategy of striving for sophisticated expression despite limitations in their grammatical accuracy. This shared approach might stem from a perceived necessity to demonstrate a high, if somewhat superficial, level of English proficiency through ‘quick and easy’ impressionistic means. In Chinese EFL contexts, this is perhaps driven by high academic expectations or a desire to impress.
Style B: Striking a balance
Style B, the second largest cluster, also demonstrates attempts at using advanced vocabulary and sentence structures. However, a key difference with Style A lies in a better grasp of grammatical and constructional conventions, resulting in fewer errors and shorter but more coherent sentences. Essays in this cluster seem to take a less performative approach, striking a more careful balance between displaying grammatical and lexical competence. Consider Example 3 which shares the same topic as Example 1.
Example 3
‘Tiger parenting’- a term which took birth after the publication of Amy Chua's book, Battle Hymn of the Tiger Mother in 2011 refers to authoritarian parenting style that is strict, demanding, shames and pressures children to have high academic achievement and involvement in high-status extra-curricular activities, such as violin-playing, piano playing and ballet dancing to name but a few. This essay aims to describe the effects of tiger parenting on children in America and Hong Kong.
There is no doubt that tiger parenting does more harm than good to the development of the children. It has been evidenced in a study that tiger moms produce kids who are more estranged from their parents and experience relatively higher depressive symptoms
While still containing typical errors like awkward collocations (‘took birth’, ‘higher depressive symptoms’), and missing determiners (‘authoritarian parenting style’), there is a striking absence of questionable stylistic choices seen in the previous examples. Advanced vocabulary (‘authoritarian, estranged’) is used in more contextually appropriate ways and with greater awareness of register (e.g. ‘essay’ instead of ‘article’ in Example 1). Complex sentences like the first sentence combine embedded, relative, and conjoined clauses with greater proficiency than before.
Likewise, as shown by Example 4, Style B is not restricted to argumentative writing but also manifests in reflective essays like ‘a letter to myself’.
Example 4
To believe in yourself is a big concept, but I know you are sure about what you will do and achieve. First of all, apply for the school scholarship which will prove you have dedicated a lot of time and put in a lot of effort during your study and you have gotten an excellent GPA in the last term. Besides, you can spend the scholarship on your summer trip with your best friends.
Secondly, get a higher GPA in this term. The GPA in last term is not enough to apply for the top-class scholarship, and you are clear that you spent too much time playing volleyball and phone last year, which made you have little time do homework or extra exercises. The last but most importantly, improve your volleyball skill. You are expected to become the leader in the faculty volleyball team and take part in the school competition in September, but you just started to play it since your university life. But do not worry, trust yourself, just like the saying " Practice makes perfect." If you keep spending some time practising playing volleyball every single week after a term, I am certain you will make process and become the leader in the team hopefully. In addition, always remember, volleyball will never give you up, only you can give up playing volleyball and yourself.
A similar profile of errors can be observed like collocations (‘big concept’, ‘playing volleyball and phone’), determiners and prepositions (‘The GPA in last term’), transitional markers (‘besides’, ‘the last but most importantly’), causal/temporal markers (‘since your university life’), and miscellaneous lexical issues (‘practising playing’, ‘process’). Errors like ‘made you have’ similarly suggest direct L1 influence from the Chinese construction 造成你. Compared to its narrative counterpart in Example 2, however, there are fewer instances of complex structures and advanced vocabulary. The writer instead opts to impress with isolated instances of idiomatic expressions like ‘practice makes perfect’ and creativity, as seen in the final sentence.
We can also observe some overlapping instances between Styles A and B from Fig. 2, where essays are close to both centroids and end up being assigned to either style only by a slight margin. This is primarily due to the shared goal of employing advanced vocabulary and complex sentence structures. Essays within Style B might still exhibit occasional grammatical errors that are reminiscent of Style A, especially when the writer ventures into more challenging linguistic territory. This points towards a potential developmental trajectory where writers are progressing beyond the more error-prone approach of Style A, demonstrating a growing control over grammatical structures while maintaining a focus on using advanced language.
Style C: Safe and secure
In contrast to the performative ambition of Style A and its toned-down counterpart in Style B, Style C is characterized by a primary focus on clear and concise communication. Sentences tend to be shorter with simpler word choices, contributing to the overall clarity and directness of the writing. These writers appear to be adopting a ‘safe and secure’ approach, prioritizing comprehensibility and fluency. Example 5, analyzing the impact of information technology on one’s studies, is illustrative.
Example 5
The development of information technology has had a strong impact on my studies. It has changed the way I access information, complete assignments, and interact with my teachers and classmates. In the past, I relied mostly on textbooks and printed materials. Now, I use the internet every day to support my learning. One of the biggest changes is how easy it is to find information online. Search engines help me locate useful articles, videos, and research papers. I can get different opinions and explanations on the same topic. This helps me understand better and think more critically.
Another benefit is the use of digital tools. Applications like Google Docs, PowerPoint, and educational platforms such as Moodle make it easier to organize my work. I can write essays, create presentations, and take quizzes all in one place. These tools also allow me to work with my classmates on group projects, even when we are not together physically.
In addition, communication has become more efficient. I can email my teachers when I have questions. Online forums and chat groups also give me a chance to discuss topics with classmates. This helps me learn not only from teachers but also from peers.
However, there are also some challenges. Sometimes, it is hard to stay focused because the internet has many distractions. Also, I need to be careful about using reliable sources. Not all information online is accurate.
Overall, the development of IT has made my learning more flexible and efficient. I can study at my own pace and have access to many resources. While there are still difficulties, I believe that IT has improved my education in many ways.
This experience shows how important technology is in modern education. It will continue to shape how we learn in the future.
We see that the writer’s ideas are expressed confidently and logically without unnecessary complexity. Standard argumentative writing conventions like topic sentences and logical-temporal markers (‘in the past’, ‘now’, ‘overall’) are used well, and most sentences are limited to a small number of clauses. The grammar is mostly accurate throughout, partly because simpler structures reduce the potential for errors. The safe formulaic approach adopted by the writer is further evidenced by the slightly misused singular ‘this experience’ in the final sentence, which tends to be a stock expression in standard reflective writing. The vocabulary is accessible and appropriate throughout. On the whole, it appears that the writer is deploying a no-risk strategy upon a foundation of good language skills.
As a timely illustration of the increasing variance from Style A onwards, Example 6 below is a reflective version of Style C that likewise prefers simple structures and words. However, the grammatical errors are far more pronounced, suggesting that the foundational skills seen in Example 5 is by means a defining feature of this style.
Example 6
Dear myself,
Now I want to make an annual goal and plans for myself in order to push me go ahead and motivate me to enjoy the life. I totally make three plans for myself and they are as followed.
The first one is to pass the College English Test Band 4. It is of significant for me to pass it as it not only can check my English capacity but also benefit for my job in the future. The second one is to find a part-time job. The reason why I want to do it is that I think working experience is very necessary and meanwhile I want to experience the life and exercise myself. The third one is to study hard because I am a student and study is my first important thing to do.
In the end, I hope these three plans can be achieved and I will try my best to do it!
We likewise observe a formulaic approach as the writer carefully and plainly signals the three plans (‘The first/second/third one’), followed by a concluding ‘in the end’. While the sentence structure and word choices are conservative as characteristic of Style C, there are multiple familiar grammatical errors such as collocations (‘goal and plans’), determiners (‘the life’), parts of speech (‘of significant’, ‘benefit for my job’), and so on. ‘Push me go ahead’ is likely another result of L1 transfer as serial verb constructions are standard in Mandarin Chinese, as in the direct translation 推动我向前. This example illustrates the earlier point about emergent writing styles as non-discrete categories, be it from a conceptual or empirical (i.e. clustering) perspective. While the computational analyses in Phases 1 and 2 assigned this essay to Style C, presumably based on its core ‘fingerprint’ of concise communication, the frequency and types of grammatical errors present was also found in earlier styles. The most pertinent and valuable insight, however, comes from the very fact of identifying concise communication rather than the more human-like tendency of focusing on grammatical mistakes as the core trait. This is an important example of how machine-led approaches can (paradoxically) be more affirming than human evaluation and feedback, due to its more comprehensive consideration of the data.
Style D: Inconsistent expression
The final Style D is the smallest and most diffuse, which implies a relatively uncommon and variable approach to writing. Some essays might exhibit simpler sentence structures and basic vocabulary, while others might attempt more complex language but with noticeable inconsistencies in grammar and overall style. Such inconsistencies are likely to manifest across, as well as, within essays. As with the prior discussion of Style C, it is important to emphasize that ‘inconsistency’ should not be seen as undesirable by default. Style D may represent writers who have yet to develop full awareness of writing conventions. However, it may also profile writers who are deliberately experimenting with different ways of expressing themselves. Consider Example 7, an argumentative essay discussing the effects of global warming.
Example 7
Global warming is a serious problem in today's world. It affects many aspects of life. The weather is hotter. Because of increasing greenhouse gases and human activities such as deforestation and industrial emissions, climate patterns are changing drastically which makes many people feel uncomfortable due to extreme temperatures.
Additionally, global warming triggers natural disasters. Floods, droughts, and severe storms occur more frequently. People living in poorer countries find it harder to protect themselves because they lack advanced technology and sufficient resources. As a result, many lose their houses or even their lives.
Animals also suffer greatly from climate change. Polar bears depend on ice for survival. With rising temperatures, their habitats are melting rapidly. Consequently, polar bears struggle to find food and shelter. Similarly, other species face extinction as they cannot adapt quickly to the changing climate.
However, some countries and individuals are attempting to mitigate global warming effects. They promote renewable energies, such as solar panels and wind turbines, which are cleaner alternatives to coal and gas. Unfortunately, renewable energy can be very costly and difficult for poorer regions to implement due to financial and infrastructural challenges.
In conclusion, global warming poses a significant threat, necessitating immediate action from everyone. Governments and individuals must cooperate to find practical solutions. If ignored, future generations will undoubtedly suffer severe consequences. Thus, action must be taken promptly to ensure a sustainable future.
The introductory paragraph has several simple sentences that typify basic descriptive writing, immediately followed by a much more complex construction with multiple relative clauses commonly seen in EFL argumentative writing. The overall appropriate vocabulary is occasionally marked by lexical and collocational choices that range from unconventional (e.g. ‘houses’ instead of ‘homes’, ‘energies’ instead of ‘energy sources’) to awkward and at times redundant (e.g. ‘necessitating’, ‘consequently’, ‘thus’). Some argumentative inconsistencies can also be observed. While the second paragraph competently exemplifies and develops its topic sentence, the third paragraph transitions somewhat abruptly from general climate impacts to the specific example of polar bears. This blending of styles can also be manifested in examples where the writer is at an obviously lower level of competence. Consider the ‘letter to myself’ in Example 8.
Example 8
Dear myself,
At the time of a new term, I'm writing to make some plans and goals. Although I know plans may not able to keep up with changes, maybe it's better to do plans than nothing to do.
I want to prove myself in many aspects to be a better person, who can completely control and charge herself, and to be more powerful to embrace the life. From the point of studying in the university, I wish me will study hard in class and get good grades. As a study monitor in class, getting good grades is closely related to my face. So be cautious and industrious. Just keep the grades in last term is not easy. And this term, I get more subjects which means more challenging. Review the knowledge offenly before the classes. And give more time to revise in the final exam. Don’t burry too much energy on the clubs and organizations, if they can't give me something valuable.
It's important to get the CET-4 certificate. Remember words in apps everyday to expand the vocabulary. Do past papers and learn from them. Many words to say. If I can do these as the plan, I will make more plans.
From a human reader’s perspective, this example is quite likely to be seen as similar to Example 6 in Style C – replete with lexical and grammatical errors that have the unfortunate side effect of occluding more subtle stylistic fingerprints. These familiar errors include incomplete (‘do plans than nothing to do’, ‘from the point of studying’) and awkward constructions (‘at the time of a new term’, ‘to be more powerful to embrace the life’), determiners (‘the life’, ‘the vocabulary’), lexical errors (‘offenly’, ‘burry’), and so on. Our approach instead highlights the stylistic feature of inconsistency which places this writing in the same category as Example 7, revealing otherwise difficult-to-detect commonalities between writers. This inconsistency is most apparent in the mixture of clause types where declaratives are intermittently followed by imperatives in the second paragraph (e.g. ‘As a study monitor in class, getting good grades is…’, ‘So be cautious and industrious’). The imperatives are likely a stylistic choice where the writer imagines talking to themselves in the letter, embedded in the wider surrounding context of describing their situation with declaratives. While it is likely that many EFL teachers would focus on its weak grammar, a better approach might be to also acknowledge the possibility of creative experimentation, which may in turn reflect diverse learning paths, personalities, or varying levels of engagement with the English language.
Before we conclude with a synthesized discussion of insights, potential applications, and future directions, Table 1 summarizes the four emergent writing styles and their characteristics as discussed above.
Table 1. Summary of Styles A-D
Style | Frequency rank | Variance rank | Characteristics |
|---|---|---|---|
A: Error-prone ambition | 1st (N = 341) | 4th | Performance of complex structures and vocabulary that is often not contextually appropriate |
B: Striking a balance | 2nd (N = 257) | 3rd | Balanced and more appropriate use of complex and simple structures and vocabulary |
C: Safe and secure | 3rd (N = 187) | 2nd | Emphasizes clear and concise communication, prioritizing comprehensibility and fluency over impressiveness |
D: Inconsistent expression | 4th (N = 107) | 1st | Noticeable inconsistencies within and across essays in word choices, sentence structures, and overall style |
Conclusion
This paper demonstrated the combined use of VAE, clustering, and manual analysis to identify EFL writing styles that are not always visible and/or affirmed by traditional assessment models. The methodological approach was designed to be replicable with a reasonably sized corpus that is realistic for interested practitioner-researchers. With reference to our objectives, our approach has been shown to be feasible and insightful, and we will proceed to summarize key insights and implications for teaching, assessment, and avenues for future research.
Firstly, we highlighted the limitations of evaluating EFL writing through a top-down prescriptive approach, as exemplified by the predominant use of rubrics. We instead adopted a data-driven approach that allows latent writing styles to emerge as subtle patterns in a large dataset, organized around a core set of stylistic ‘fingerprints’ that are interpreted through further qualitative analysis. The general advantage of this approach is that it can uncover the potential richness and diversity of EFL styles, and corresponding learner and learning-related insights. A key finding that reflects the inherent variability of human writing is that styles are non-discrete and have different fundamental properties like size and stability. Specifically, we found an interesting proportional relationship where the largest style was also the most stable (Style A), and vice-versa (Style D). The fact that ‘error-prone ambition’ and its attendant performative strategy seems to be the most frequent and invariant is worth further critical reflection, especially since it tends to backfire if equal attention is not paid to contextual and pragmatic appropriateness.
Furthermore, our analyses of exemplars across argumentative, narrative, and reflective genres suggest that writing styles are underpinned not only by formal linguistic features, but may also reflect more abstract learning attitudes and predispositions (Reid, 1995). For example, our styles A to D may be respectively (re)interpreted as indicating a desire to perform (A), to consider multiple aspects (B), to be prudent and conservative (C), and to be adventurous (D).
They may also be seen as indicating different individual learner strategies (Cohen, 2011). Some learners might prioritize vocabulary acquisition, leading them to adopt a style similar to A or B. Others might focus more on grammatical accuracy and fluency, resulting in a style akin to C. Learners who are more experimental or less concerned with consistency might exhibit the characteristics of style D. Pedagogical influences could also play a significant role. Curricula that heavily emphasize vocabulary acquisition might inadvertently encourage style A, those that focus on basic communicative competence might foster style C, while a lack of explicit instruction or a highly varied educational background among learners could breed style D. The distribution and characteristics of these styles can provide insights for tailoring instruction to the specific traits, needs, and challenges of different learners.
Our findings may also shed light on EFL writing styles from a more diachronic, developmental perspective. Overlapping cases that seem equally far from more than one cluster centroid may suggest a developmental continuum where writers transition between different styles as their proficiency evolves. The possibilities are dynamic and interesting. For example, a writer might initially prioritize clear communication (Style C) and then begin to experiment with more complex language (Style D), or they might be at an early stage and eager to demonstrate their expanding vocabulary (Style A) but learn to become more pragmatic (Style C). While not the primary focus of this paper, overlaps between these styles could reflect such meaningful transitory moments in their learning journey. Since overlapping points in cluster analyses are often seen indicators of an unsatisfactory solution, our present application is furthermore a good example of how general data science principles like ‘performance’ and ‘interpretability’ need to be critically reconsidered in more nuanced humanistic contexts like EFL writing.
Lastly, an important observation with pedagogical import is that our approach is not oriented towards lexical and grammatical errors, but focuses instead on stylistic regularities that underpin different writing processes and products. This was especially evident from Styles C and D. The many glaring common errors made across both styles may well be picked upon by human readers, with the additional consequence of failing to notice subtle but important differences between them. Our approach instead helped us discern ‘concise communication’ from ‘stylistic inconsistency’ in the midst of this common pool.
It therefore serves as a concrete resource for educators to be less punitive and more affirmative – by identifying, analyzing, and even celebrating these nuances in a systematic and demonstrable way.
We end by suggesting some directions for future research. This study, with its proof-of-concept emphasis, has been most obviously limited by its exclusive reliance on a single corpus of EFL essays. A most straightforward extension would be to apply the approach to larger and more diverse datasets of EFL writing, across different demographic categories, educational levels and cultural contexts, to assess the generalizability of these findings. Investigating the (combined) use of other deep learning and clustering algorithms could also provide further insights and technical enhancement of writing style identification. Another clearly motivated direction is to explore the relationship between emergent writing styles and key learner variables such as motivation, language proficiency levels, and cultural background. Our preliminary remarks on the potential convergence between styles and culturally driven expectations, for example, serves as a starting point. Finally, we also alluded to longitudinal within-subject studies to examine how writing styles evolve over time, in potential response to different changing conditions and/or pedagogical interventions. Such evolution could take the form of essays from the same writer moving categorically between clusters, gradually from a cluster centroid to an overlapping region, or any other interpretable movement pattern. This could then be linked to the design of pedagogical interventions aimed at helping learners manage these transitions.
Acknowledgements
Not applicable.
Author contributions
DT conceptualized the topic and methodology, analyzed the data, and wrote the manuscript. DX performed data collection and contributed to the writing. All authors read and approved the final manuscript.
Funding
This work is supported by a Nanyang Technological University Faculty Startup Grant (Award #024271–00001).
Availability of data and materials
The data analysed in the current study are available from the corresponding author on reasonable request. The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Declarations
Competing interests
The authors declare that they have no competing interests
The encoder consisted of a bidirectional GRU (256 hidden units) that produced a 50-dimensional latent vector, parameterized by a mean (µ) and log variance (log σ2). The decoder was a unidirectional GRU (256 hidden units) with a softmax output over the BERT tokenizer's vocabulary. Training used the Adam optimizer (learning rate 0.001), teacher forcing (p = 0.5), and a loss function combining cross-entropy reconstruction and KL divergence (β = 0. 1). These settings are mostly hyperparameters that can be adjusted as desired.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Blei, DM; Kucukelbir, A; McAuliffe, JD. Variational inference: A review for statisticians. Journal of the American Statistical Association; 2017; 112,
Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., & Bengio, S. (2016). Generating sentences from a continuous space (arXiv:1511.06349). arXiv. https://doi.org/10.48550/arXiv.1511.06349
Canagarajah, AS. Toward a writing pedagogy of shuttling between languages: Learning from multilingual writers. College English; 2006; 68,
Chakrabarti, A., & Ghosh, J. K. (2011). AIC, BIC and recent advances in model selection. In Philosophy of statistics (Vol. 7, p. 605). Elsevier B.V. https://doi.org/10.1016/B978-0-444-51862-0.50018-6
Cohen, A. (2011). Strategies in learning and using a second language (2nd ed.). Routledge.
Connor, U. New directions in contrastive rhetoric. TESOL Quarterly; 2002; 36,
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding (arXiv:1810.04805). arXiv. https://doi.org/10.48550/arXiv.1810.04805
Ferris, D. R., & Hedgcock, J. S. (2013). Teaching L2 composition: Purpose, process, and practice (3rd ed.). Routledge. https://doi.org/10.4324/9780203813003
Goldberg, Y. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research; 2016; 57,
Gu, Q; Schweisfurth, M. Who Adapts? Beyond cultural models of ‘the’ Chinese Learner. Language, Culture and Curriculum; 2006; 19,
Hamp-Lyons, L. (2007). The impact of testing practices on teaching. In: Cummins, J., spsampsps Davison, C. (Eds.), International handbook of english language teaching (pp. 487–504). Springer US. https://doi.org/10.1007/978-0-387-46301-8_35
Hu, G. Potential cultural resistance to pedagogical imports: The case of communicative language teaching in China. Language, Culture and Curriculum; 2002; 15,
Hyland, K. Second language writing. Cambridge University Press; 2003; [DOI: https://dx.doi.org/10.1017/CBO9780511667251]
Hyland, K. Genre pedagogy: Language, literacy and L2 writing instruction. Journal of Second Language Writing; 2007; 16,
Kingma, D. P., & Welling, M. (2022). Auto-encoding variational bayes (arXiv:1312.6114). arXiv. https://doi.org/10.48550/arXiv.1312.6114
Liu, M; Braine, G. Cohesive features in argumentative writing produced by Chinese undergraduates. System; 2005; 33,
Manning, CD; Clark, K; Hewitt, J; Khandelwal, U; Levy, O. Emergent linguistic structure in artificial neural networks trained by self-supervision. Proceedings of the National Academy of Sciences; 2020; 117,
Matsuda, PK. Voice in Japanese written discourse: Implications for second language writing. Journal of Second Language Writing; 2001; 10,
Panadero, E; Jonsson, A. The use of scoring rubrics for formative assessment purposes revisited: A review. Educational Research Review; 2013; 9, pp. 129-144. [DOI: https://dx.doi.org/10.1016/j.edurev.2013.01.002]
Reid, J. (Ed.). (1995). Learning styles in the ESL/EFL CLASSROOM. Heinle and Heinle.
Sadler, DR. Indeterminacy in the use of preset criteria for assessment and grading: Assessment and evaluation in higher education: Vol 34, No 2. Assessment & Evaluation in Higher Education; 2009; 34,
Silva, T. Toward an understanding of the distinct nature of L2 writing: The ESL research and its implications. TESOL Quarterly; 1993; 27,
Tay, D. Quantitative metaphor usage patterns in Chinese psychotherapy talk. Communication and Medicine; 2017; 14,
Tay, D. A computerized text and cluster analysis approach to psychotherapy talk. Language and Psychoanalysis; 2020; 9,
Tay, D. (2021). COVID-19 press conferences across time: World Health Organization vs. Chinese Ministry of Foreign Affairs. In: Breeze, R., Kondo, K., Musolff, A., & Vilar-Lluch v(Eds.), Pandemic and crisis discourse. Communicating COVID-19 (pp. 13–30). Bloomsbury.
Tay, D. (2024). Data analytics for discourse analysis with python: The case of therapy talk. Routledge.
Tay, D., & Qiu, H. (2022). Modeling linguistic (A)synchrony: A case study of therapist–client interaction. Frontiers in Psychology, 13. https://www.frontiersin.org/article/https://doi.org/10.3389/fpsyg.2022.903227
Torrance, H. (2007). Assessment as learning? How the use of explicit learning objectives, assessment criteria and feedback in post‐secondary education and training can come to dominate learning. 1. Assessment in education: Principles, policy and practice, 14(3), 281–294. https://doi.org/10.1080/09695940701591867
Weigle, SC. Assessing writing. Cambridge University Press; 2002; [DOI: https://dx.doi.org/10.1017/CBO9780511732997]
Whong, M. (2023). The importance of scholarship by language practitioners in higher education. In best practices in english teaching and learning in higher education. Routledge.
Wongvorachan, T., & Bulut, O. (2025). The use of natural language processing in learning analytics. In: Saqr, M., & López-Pernas, S. (eds.), Advanced learning analytics methods: AI, precision and complexity. Springer. https://lamethods.github.io/ch09-nlp.html
Zhai, X; Yin, Y; Pellegrino, JW; Haudek, KC; Shi, L. Applying machine learning in science assessment: A systematic review. Studies in Science Education; 2020; 56,
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.