Content area
The assessment of Chinese text readability plays a significant role in Chinese language education. Due to the intrinsic differences between alphabetic languages and Chinese character representations, the readability assessment becomes more challenging in terms of the language’s inherent complexity in vocabulary, syntax, and semantics. The article proposed the conceptual analogy between Chinese readability assessment and music’s rhythm and tempo patterns, in which the syntactic structures of the Chinese sentences could be transformed into an image. The Chinese Knowledge and Information Processing Tagger (CkipTagger) tool developed by Sinica-Taiwan is utilized to decompose the Chinese text into a set of tokens. These tokens are then refined through a user-defined token pool to retain meaningful units. An image with part-of-speech (POS) information will be generated by using the token versus syntax alignment. A discrete cosine transform (DCT) is then applied to extract the temporal characteristics of the text. Moreover, the study integrated four categories: linguistic features–type–token ratio, average sentence length, total word, and difficulty level of vocabulary for the readability assessment. Finally, these features were fed into the Support Vector Machine (SVM) network for the classifications. Furthermore, a bidirectional long short-term memory (Bi-LSTM) network is adopted for quantitative comparisons. In simulation, a total of 774 Chinese texts fitted with Taiwan Benchmarks for the Chinese Language were selected and graded by Chinese language experts, consisting of equal amounts of basic, intermediate, and advanced levels. The finding indicated the proposed POS with the linguistic features work well in the SVM network, and the performance matches with the more complex architectures like the Bi-LSTM network in Chinese readability assessments.
Details
Information processing;
Data processing;
Language instruction;
Politics;
Simulation;
Insurance policies;
Chinese languages;
Rhythm;
Discrete cosine transform;
Linguistics;
Speech;
Machine learning;
Semantics;
Syntax;
Short term memory;
Support vector machines;
Syntactic structures;
Neural networks;
Reading comprehension;
Complexity;
Syntactic complexity;
Sentences
; Bo-Yuan, Huang 4 ; Yi-Chi, Huang 4 ; Yu-Xiang, Chen 4 1 The Institute of Chinese Language Education, National Kaohsiung Normal University, Kaohsiung 80201, Taiwan; [email protected]
2 Department of Electrical Engineering, National Chiayi University, Chiayi City 600325, Taiwan; [email protected]
3 Department of Electrical Engineering, National Chung Cheng University, Minhsiung 621301, Taiwan; [email protected] (B.-Y.H.); [email protected] (Y.-C.H.); [email protected] (Y.-X.C.), Advanced Institute of Manufacturing with High-Tech Innovations, Ans. 621301 Innovation Building R209, 168 University Road, Ming-Hsiung Township, Chia-Yi 621301, Taiwan
4 Department of Electrical Engineering, National Chung Cheng University, Minhsiung 621301, Taiwan; [email protected] (B.-Y.H.); [email protected] (Y.-C.H.); [email protected] (Y.-X.C.)