This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
Natural language processing (NLP) [1] refers to the process of human language into the form of data that can be understood by computers, and the analysis of data and decision-making by simulating the way of thinking of human beings, so as to realize the information exchange between computers and human beings. In today’s era of big data, natural language processing technology has become a powerful tool for human to analyze text and mine data. Subjective item is a kind of question which can better test students’ knowledge accumulation and subjective cognition. At the same time, it requires the students to reach their own understanding by combining their own knowledge and experience. Teachers need to score students according to their level of literary talent and understanding not just to judge whether they are right or wrong. In recent years, the theory and technology in the field of deep learning are developing vigorously, which strongly promote all walks of life to continuously advance on the intelligent road. Of course, the education industry is no exception, and many intelligent education software have emerged, especially the intelligent demand of homework system and examination system is more and more obvious [2, 3]. At present, due to the unique fixity of the answer itself, the objective question only needs to compare the student’s answer with the standard answer to determine whether the question scores or not, so it is a simple programming technical problem. However, the answer to the subjective item is obviously not the only one, and the level of computer intelligence is not comparable to that of the human brain. There is still a certain gap between the scoring effect of the computer to truly achieve the teacher’s level, so the automatic scoring technology of subjective item is still difficult to take shape [4]. NLP theory and technology based on deep learning continue to develop and expand, gradually narrowing the gap, making the automatic scoring technology of subjective item get better and better results.
Subjective item is particularly common in the political education classes, relative to other courses, which have stronger subjectivity and flexibility; the score generally has nothing to do with standard answer order of scoring, scoring higher difficulty, so education course subjective item automatic scoring algorithm has the practical value and significance under the background of artificial intelligence and education. First of all, it can help to reduce the score gap caused by the subjective factors of grading teachers and improve the fairness of scoring. Secondly, it can reduce the burden of scientific research workers and teachers on teaching. Finally, it can simplify the whole examination process, improve the efficiency of online education platform, and truly realize intelligent examination [5].
Based on the subjective item of Ideological and political courses, this paper analyzes the characteristics of standard answer text and student answer text. Combined with the deep learning model, the automatic scoring algorithm of subjective item is studied for ideological and political courses. The model takes DSSM model as the basic framework and uses the Bert model to realize the text representation to solve the problem of polysemy.
2. Application Status of Short-Text Matching Technology
Short-text matching [6] is a widely used core technology in the field of NLP. It aims to analyze and judge the semantic relationship between two texts. It is widely used in information retrieval [7], question answering system [8], repetition recognition [9], and natural language reasoning [10]. In information retrieval, users want to find documents related to a given query. While for search engines, how to match a given query to the right document is very important. Text matching can also be used to match the right answers for questions in question answering system, which is very helpful for the automatic customer service robot and can greatly reduce the labor cost. Repetition recognition is used to identify whether two natural questions are semantically consistent, while natural language reasoning mainly focuses on whether the hypothetical text can be inferred from the premise text. Therefore, the research on short-text matching is of great significance.
The traditional text matching algorithm mainly solves the problem of word matching at the lexical level, which has some problems such as word meaning limitation, structure limitation, and knowledge limitation. The research on short-text matching has gradually shifted from the traditional statistical method to the deep semantic short-text matching model. In recent years, the pretraining models such as word2vec [11], glove [12], and Elmo [13]. have solved the problem of text vectorization.
At present, most short-text matching models only consider the internal information of text when extracting text features, ignoring the interaction information between two texts, or only carry out single-level interaction, thus losing the rich multilevel interaction information between texts. Information retrieval is also a more complex task, often in the form of Query -- Title, Query -- Document, and a more complex Query might be a Document becoming a Document -- Document, as opposed to other matching tasks. Similarity calculation and retrieval are only a necessary process, more importantly, the need to sort, generally through the retrieval method to recall the relevant items, and then the relevant items rerank. To solve the above problems, we propose an improved short-text matching (ISTM) model based on transformer [14]. ISTM model takes DSSM as the basic framework, uses the Bert model to express the text in vectorization, solves the problem of polysemy of word2vec, and uses the transformer encoder to extract the features of text.
3. Text Scoring Model of Ideological and Political Education Based on Improved Transformer
Aiming at the low accuracy of long-text similarity calculation in subjective item, we have transformed the long-text similarity problem into multiple short-text similarity problems in the semantic integrity analysis task. A common solution to this problem is to use keywords to extract feature vectors of long text and then use the similarity between feature vectors to measure the similarity of the corresponding text. The next work is to calculate the short-text similarity, so we propose an improved short-text similarity calculation model based on transformer. The specific implementation steps are as follows:
(1) the similarity value between each semantic complete sentence and each score point in the standard answer is calculated by the model
(2) then, a score matrix can be obtained by combining the score value of each score point
(3) finally, according to certain algorithm, the score matrix selects the score sequence with the highest total score and no overlapping row and column as the final score. Therefore, the process of the algorithm is shown in Figure 1
[figure(s) omitted; refer to PDF]
3.1. Transformer Encoder
The transformer encoder has two sub layers, which are multihead self-attention layer and feedforward neural network layer. At the same time, there is a sum layer and normalization step around each sub layer. Its structure is shown in Figure 2, where
[figure(s) omitted; refer to PDF]
The dimension of input matrix
The self-attention mechanism computes query matrix
The output matrix
The output matrix
Sum and layer normalization are performed, where
Finally,
3.2. Input of Ideological and Political Text
Text vectorization is an important process in natural language processing. Word2vec is one of the earliest pretraining models. In the past, most of the previous work used word2vec to realize the vectorization of text, which has the advantages of simplicity, high speed, and strong universality. However, it is limited by the corpus. Its modeling is relatively simple and cannot reflect the multilayer characteristics of words, namely, grammar and semantics. Based on transformer, Bert uses the bidirectional encoder structure of transformer, which can reflect the multilayer characteristics of words. This paper uses the Chinese model of BERT-BASE. The model has 12 transformer encoders, each transformer encoder has 12 attention heads, and the hidden layer dimension is 768.
3.3. Expression of Ideological and Political Texts
For the multilayer encoder structure of transformer model, it can learn rich syntactic information features in the low-level network, and the higher the level, the closer to semantic information. In order to make the two texts interact in multilevel, we add interactive attention sublayer on the basis of transformer encoder. Suppose that the embedding matrix of the first text is represented
Then, text
Maximize the pool operation on each line of interactive attention matrix
Maximize the pooling of each column of the interactive attention matrix
In this way, each time the two texts are encoded, they will have an information exchange. According to the number of encoders, the two texts can interact with each other at different levels, which can not only encode the context information in depth but also obtain the enhanced interactive information.
3.4. Similarity Prediction of Ideological and Political Texts
After passing through the presentation layer, the two texts are encoded as matrix tables with rich contextual and interactive information, assuming
Among them, the
3.5. Score Calculation
Through the short-text similarity calculation model, we can calculate the similarity value between each score point in the standard answer and each semantically complete sentence in the student answer, and then combining the score value of each score point, we can get a score matrix
The next work is to select the score according to the score matrix to calculate the final score of students’ answers. It should be noted that every time a score is selected, the peer or column elements of the score cannot be selected, because it is necessary to ensure that a score point in the final score sequence corresponds to a student’s answer clause. Otherwise, it may lead to a student’s answer clause scoring at multiple scoring points.
4. Experiment and Analysis
4.1. Data Set
The automatic scoring algorithm of subjective item is studied for ideological and political courses. For the task of semantic integrity analysis, we collected Ma Yuan, Mao Gai, and Si Xiu’s modern history and other ideological and political corpus. After removing redundant and useless characters from the original corpus, we got 12400 semantic complete sentences, about one million characters. Then, the order of the sentences is disrupted to make the data more evenly distributed, and each character is marked by Jieba part of speech tagging combined with manual tagging, and the sentences are separated by a new line. Finally, the data set was segmented in a ratio of 6 : 2 : 2.
Due to the scarcity of Chinese subjective item scoring data sets, we collected a certain number of “standard answer-student answer” answer data sets based on ideological and political course exams, among which each course accounted for a relatively balanced proportion. 1000 score-student answer clauses were extracted from these answer pairs, and the similarity value was marked to each sentence pair according to their ratio, so as to obtain the data set of short-text similarity calculation task. We also segmented the data set according to the ratio of 6 : 2 : 2. In addition, we retained 100 answer pairs and scaled the total score to 10 points for the automatic scoring of the whole subjective item.
4.2. Model Parameters
The main model parameters set in this experiment are shown in Table 1.
Table 1
Experimental parameter setting.
Parameter | Value |
Encoder layers | 2 |
Number of attention heads | 8 |
Hidden layer dimension | 768 |
Dropout | 0.10 |
Model optimizer | Adam |
Maximum sequence length | 25 |
Batch_size | 512 |
When training the model, we need to pay attention to the convergence of the model. If the model converges, we should stop training; otherwise, the model will overfit and fail to achieve the desired effect. The convergence of short-text matching model is shown in Figure 3.
[figure(s) omitted; refer to PDF]
As shown in Figure 3, the model has begun to converge after the training times exceed 25, so set the training times to 30.
4.3. Evaluation Index
Recall and precision are the best tools to measure the accuracy of prediction model. The model in this paper also uses these two concepts to judge. By constantly adjusting the sampling mode and proportion, the accuracy of the model is constantly improved. We use F1 value and accuracy Acc as indexes to evaluate the model, with
4.4. Results and Discussion
4.4.1. Comparison Results of Different Models
Several classic text matching models are selected for experimental comparison. RNN (Experiment 1), BIRNN (Experiment 2), GRU (Experiment 3), BiGRU (Experiment 4), LSTM (Experiment 5), and BiLSTM (Experiment 6) were added as comparative experiments. In addition, in order to enhance the parallel computing ability and feature extraction ability, Zhao proposed to replace the deep neural network of DSSM with transformer coding component. Therefore, in order to enhance the contrast effect, transformer DSSM (Experiment 7) [15] is also used in comparative experiments. Moreover, Experiment 8 represents our models.
The experimental results of each model are shown in Figure 4, where the abscissa is the experimental number, and the ordinate is the percentage value of different indexes. The text vectorization mode adopted by the model is BERT by default; otherwise, it is Word2vec.
[figure(s) omitted; refer to PDF]
The results show that the F1 value of the LSTM model can reach 86.80% and the accuracy rate can reach 86.30%, which is better than other models. Experiments 6 and 7 show that the exhibition of BERT is superior to that of word2vec, so that the F1 values and accuracy of other RNN models represented by Experiments 1 to 5 are significantly ahead of those of Experiment 7. From Experiment 1 to Experiment 8, the transformer encoder has better feature extraction ability than RNN. In addition, Experiments 7 and 8 show that the multilevel information interaction of the model in this paper does improve the effect of short-text matching, mainly reflected in the improvement of F1 value and accuracy. The reason why LSTM model achieves better matching effect is that it can solve the problem of polysemy by using the Bert model for text vectorization. At the same time, the transformer encoder has better feature extraction ability. The multilevel information interaction makes the two texts get rich interactive information, which has a better effect on short-text matching.
4.4.2. Scoring Effect of Ideological and Political Texts
After training and integrating semantic integrity analysis model and short-text similarity calculation model, subjective item can be scored. We think that if the difference between automatic scoring and manual scoring is no more than 10% of the total score (10 points), the automatic scoring is correct; otherwise, it is an error. We use greedy strategy and global optimal strategy to score the 50 reserved “standard answer -student answer” pairs. The accuracy results of the two strategies compared with the real manual scoring are shown in Figure 5.
[figure(s) omitted; refer to PDF]
Randomly select a sample of subjective test set in the final examination of ideological and political course for freshmen in a university and load the students’ answers and reference answers in the subjective item scoring system. The test paper numbers of students’ answers are 01, 02, …, 49, and 50, respectively.
From the selected samples, it can be clearly seen that generally ideal scoring results have been accomplished partially. There are a few distinctions in the scoring consequences of certain examples. This error may be composed of the following two parts. One is the text segmentation model proposed in this paper. There are few improper word segmentations in word segmentation. The other situation may be that the subjective items are evaluated manually, which may be caused by personal subjective opinions. On the whole, the ideological and political education text scoring model based on improved transformer has achieved relatively ideal results in practical application.
On the basis of all the previous improvements, as we expected, the global optimal strategy has a good scoring effect and is obviously better than the greedy strategy. The greedy strategy only focuses on the current maximum value, and the selected scores are often quite different. However, the global optimal strategy starts from the whole, and the selected scores are more evenly distributed, which is in line with the situation that teachers grade students’ answers according to the score points in real life.
5. Conclusion
Under the background of intelligent education, this paper presents an automatic scoring algorithm for subjective questions in ideological and political lessons. Aiming at the problem that the accuracy of similarity calculation of long text is not high, a short-text similarity calculation model based on transformer is constructed, which is used to calculate the similarity between each score point in the standard answer and each clause in the student’s answer, so as to achieve better scoring effect. The test results show that the transformer encoder has better feature extraction ability, multi-level information interaction enables the two texts to obtain rich interactive information, and the matching effect of short text is better improved. In the practical application of ideological and political teaching, the ideal effect has been achieved. However, for the training of the model, we did not take data with different sample sizes, and the training time of the model was not tested. Further work will be carried out around this.
[1] A. A. Kimia, G. Savova, A. Landschaft, M. B. Harper, "An introduction to natural language processing," Pediatric Emergency Care, vol. 31 no. 7, pp. 536-541, DOI: 10.1097/PEC.0000000000000484, 2015.
[2] P. Wang, T. Xiaoyong, S. Qiaoyu, "Interpretable educational artificial intelligence: system framework, application value and case analysis," Journal of Distance Education, vol. 39 no. 6, pp. 20-29, 2021.
[3] L. Quanlong, Y. Yulong, B. Yinglei, G. Xiaofei, "Visual analysis of personalized education research supported by “artificial intelligence + deep learning”," China Adult Education, vol. 6, pp. 30-37, 2021.
[4] B. Yang, Y. Q. Yao, "Research on automatic scoring algorithm of Chinese subjective item based on text mining," Advances in Science and Technology, vol. 105, pp. 377-383, DOI: 10.4028/www.scientific.net/AST.105.377, 2021.
[5] Y. Li, C. Huang, L. Ding, Z. Li, Y. Pan, X. Gao, "Deep learning in bioinformatics: introduction, application, and perspective in the big data era," Methods, vol. 166,DOI: 10.1016/j.ymeth.2019.04.008, 2019.
[6] P. Liang, L. Yanyan, J. Xu, "A survey of deep text matching," Journal of Computer Science, vol. 40 no. 4, pp. 985-1003, 2017.
[7] W. C. Chang, D. Jiang, H. F. Yu, C. H. Teo, J. Zhang, K. Zhong, K. Kolluri, Q. Hu, N. Shandilya, V. Ievgrafov, J. Singh, "Extreme multi-label learning for semantic matching in product search," Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2643-2651, DOI: 10.1145/3447548.3467092, .
[8] Y. Wang, H. Qitao, "Intelligent system research," Electronic Technology and Software Engineering, vol. 5, pp. 174-175, 2019.
[9] C. Xin, L. Weikang, Y. Hong, "Research on multiple convolution self interactive matching method for question repetition recognition," Chinese Journal of Information Technology, vol. 33 no. 10, pp. 99-108, 2019.
[10] L. Guanyu, Z. Pengfei, J. Caiyan, "An attention enhanced natural language reasoning model," Computer Engineering, vol. 46 no. 7, pp. 91-97, 2020.
[11] B. Jang, I. Kim, J. W. Kim, "Word2vec convolutional neural networks for classification of news articles and tweets," PloS One, vol. 14 no. 8, article e0220976,DOI: 10.1371/journal.pone.0220976, 2019.
[12] J. Henderson, J. Condell, J. Connolly, D. Kelly, K. Curran, "Review of wearable sensor-based health monitoring glove devices for rheumatoid arthritis," Sensors, vol. 21 no. 5,DOI: 10.3390/s21051576, 2021.
[13] K. Lee, M. Filannino, Ö. Uzuner, An empirical test of GRUs and deep contextualized word representations on de-identification, 2019.
[14] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, "Attention is all you need," Advances in Neural Information Processing Systems, vol. 30, 2017.
[15] Z. Mengfan, Research on Text Semantic Similarity Algorithm Based on Transformer, 2020.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2022 Jinghong Qi and Xinli Jia. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/
Abstract
In order to improve the accuracy of ideological and political education (IPE) text scoring, an improved short-text similarity calculation model based on transformer is proposed. This model takes the DSSM model as the basic framework and uses the Bert model to realize text representation and solve polysemy problem. The transformer encoding component is used to extract the characteristics of the text and obtain the internal information of the text. With the help of the encoding component, the two texts can interact with information on multiple levels. Finally, the semantic similarity between two texts is calculated by concatenation vector inference.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer