This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
Teaching the computer to pass the entrance examination of different education levels, which is an increasingly popular artificial intelligence challenge, has been taken up by researchers in several countries in recent years [1–3]. The Todai Robot Project [3] aims to develop a problem-solving system that can pass the University of Tokyo’s entrance examination. China has launched a similar project “key technology and system for language question solving and answer generation,” focusing on studying the human-like QA system for College Entrance Examination (commonly known as Gaokao). Gaokao is a national-wide standard examination for all senior middle school students in China and has been known for its large scale and strictness.
Although deep learning methods have achieved good results in many natural language processing tasks [4–7], they usually rely on a large scale of the dataset for effective training. However, the Gaokao task cannot receive sufficient training data under the current conditions. Different from previous typical QA tasks such as SQuAD [8], DuReader [9], and CMRC2018 [10] which can enjoy the advantage of holding a very large known QA pair set, the concerned task is equal to retrieving a proper answer from background article with guidelines of a very limited number of known QA pairs. In addition, the questions are usually given in an implicit way to ask students to dig the exactly expected meaning of the concerned facts. If such kind of meaning fails to fall into the feature representation for either question or answer, the retrieval will hardly be successful.
Generally speaking, for the Gaokao challenge, knowledge sources are extensive and no sufficient structured dataset is available, while the most existing work on knowledge representation focused on structured and semistructured types [11–14]. With regard to the answer retrieval, there are models based on semantic resources such as HowNet [15], WordNet [16], and Synonym Cilin [17]. Reference [18] proposed a sentence semantic relevance calculation method based on the multidimensional voting algorithm. This method considers the semantic relevance of different dimensions as a metric and uses the idea of the voting algorithm to select the best option for the problem. Reference [19] proposed a title selection method based on a correlation matrix between the title and the main points of the chapter. Reference [20] proposed a method for extracting candidate sentences based on frame matching and frame relationship matching and then used manifold ranking to sort the candidate sentences.
This work focuses on reading comprehension question-answering in Gaokao Chinese examinations, which accounts for a large proportion of total scoring and is extremely difficult in the exams. Reference [2] made a preliminary attempt to take up the Gaokao challenge and proposed a three-stage approach that exploits and extends information retrieval techniques. Differently, this task is to solve reading comprehension questions and has to be based on deep semantic representation and computation rather than word matching in the previous work. Table 1 shows an example question in Chinese exams, consisting of a question and answer to the question. Some answer sentences are difficult to retrieve through literal matching, and these answer sentences are not distributed in a paragraph, but in different paragraphs of different articles. For instance, the question sentence would be confusing without knowing about the background article making cultural relics “live.” In addition, some answers summarize the article from different paragraphs, while other answers summarize the author’s point of view. How to retrieve those answers hidden in scattered paragraphs is a large challenge, and it is also the key to improving the effect of the system for Gaokao.
Table 1
Example of reading comprehension QA in College Entrance Examination.
| 2017 Beijing College Entrance Examination question |
| Question: 请结合上述三则材料, 简述让文物“活”起来的含义与作用 |
| Please combine the above three materials to briefly describe the meaning and function of making cultural relics “live.” |
| 答案, 利用博物馆、各种现代技术让参观者近距离感悟文物的魅力。发挥它们在公众知史爱国, 鉴物审美, 以及技艺传承、文化养心的作用, 实现学术、趣味性统一, 以新鲜时尚的方式提供给观众审美与求知、娱乐与鉴赏的多元文化体验, 借助计算机等生成三维环境, 调动多感官, 带来沉浸感, 使用现代技术使得文物呈现方式灵活, 让更多的人喜欢上古文化, 更好地实现文物走近大众的作用。解决了展出空间有限、文物损毁等问题, 起到更好地保护文物的作用 |
| Answer: use museums and various modern technologies to make visitors feel the charm of cultural relics up close. Play their role in public knowledge of history, patriotism, appreciation of objects, as well as technical inheritance, and cultural cultivation; achieve the unity of academic and interesting; provide audiences with a multicultural experience of aesthetics and knowledge, entertainment, and appreciation in a fresh and fashionable way; and use computers to generate a three-dimensional environment, mobilize multiple senses, and bring immersion; the use of modern technology makes the presentation of cultural relics flexible, so that more people like ancient culture, and better realize the role of cultural relics reaching the public. (paragraph topic sentence) It solves the problems of limited exhibition space and damage to cultural relics and plays a better role in protecting cultural relics. (author’s opinion sentence) |
The challenge of our task would call for a new problem-solving framework for automatically answering comprehensive questions in exams. We propose a graph-based framework as shown in Figure 1. Firstly, we preprocess the articles and questions, and the evidence is drawn. Secondly, the Chinese FrameNet and discourse topic are used to construct the affinity matrix, which preserves the results of the semantic analysis of the question and each sentence. Finally, reasoning is performed by a graph-based ranking algorithm to check each candidate sentence, and the most relevent candidate sentence to the question will be returned as the answer.
[figure omitted; refer to PDF]
Our contribution is threefold: (1) after showing Gaokao’s difficulty and its difference from the existing research problems, we propose a new framework for reading comprehension QA in Gaokao. It is the first time to apply a graph-based algorithm in reading comprehension QA. (2) To the best of our knowledge, the relationship between candidate sentences has not been taken into account in the QA task. The relationship between candidate sentences is considered as a factor in our method, and the answer sentences are extracted by the unified model to improve the answering effect of the QA system. (3) Our approach achieves encouraging results on a set of real-life questions collected from recent Chinese examinations. We also release a Chinese comprehensive deep question-answering dataset to facilitate the research.
2. Reading Comprehension QA Method Based on Graph
2.1. Method Framework
The graph-based model [21] was firstly used by search engines to calculate the importance of webpages. It has been successfully used in many tasks, such as object retrieval [22], keyword extraction [23], and automatic summarization [24]. The algorithm is based on the following two assumptions. (1) Quantity assumption: in the web graph model, if a web page A is linked by a lot of other webpages, then page A is more important. (2) Quality assumption: if a page node A is linked by other higher-quality pages, then the A page is more important. The reading comprehension QA graph proposed in this paper is derived from the PageRank model. This model makes full use of the correlation between the question and candidate sentences. The global optimization ranking model is used to extract and sort the answer candidate sentences. The model is based on the following three hypotheses. (1) Quantity hypothesis: if an answer candidate sentence is associated with more other sentences, then the answer candidate sentence is more likely to be an answer sentence. (2) Quality hypothesis: if an answer candidate sentence is associated with other sentences of higher quality, then the answer candidate sentence is more likely to be an answer sentence. (3) Link weight hypothesis: the higher the degree of correlation between the question and the answer candidate sentence is, the more likely the answer candidate sentence is the answer sentence.
This paper makes use of the“voting” or “recommendations” between the question and sentences in the QA problem. The graph for reading comprehension QA is shown in Figure 2. The squares represent the candidate sentences
[figure omitted; refer to PDF]
In this paper, the function
Algorithm 1: QA algorithm for reading comprehension based on a graph.
Input: question
Output: top 6 answer candidate sentences.
(1) Calculate the relationship between the question
(2) Calculate the relationship between each candidate sentence through the word similarity. If the degree of relationship between two nodes is greater than 0, the nodes are connected by an edge. Construct the affinity matrix
(3) Combine the affinity matrix and normalize it. Define
(4) Iterate
(5) Use
Return top 6 candidate sentences with the highest score.
In the first step of the algorithm, the relationship between the question
In the second step of the algorithm, since the task of this paper is automatic QA, and the answer candidate sentences need to be extracted. The importance transmitted between candidate sentences should be related to the question, and the importance not related to the question should not be transmitted to each other. Therefore, the following formula is used to calculate the relationship between candidate sentences:
In the third step of the algorithm, the high-quality answer sentences are all explanations and answers to the question. The extraction effect depends largely on the relationship between the candidate sentences and the question, and it is less affected by the relationship between the candidate sentences. Therefore, different weights should be set for the affinity matrix of the two parts.
In the fourth step of the algorithm,
By counting the suggested answers of the examination papers over several years, it is found that the average number of answer sentences is 6. If the number of outputs is less than 6 sentences, it is not enough to cover all answer points; if the number of outputs is greater than 6 sentences, the redundancy of the output answers is high. Finally, the top 6 candidate sentences are selected as answer sentences by the algorithm.
2.2. Calculation of the Relationship between the Question and Candidate Sentences
The calculation of the relationship between the question
2.2.1. Answer Sentence Extraction Based on Similarity Measure
First, preprocess the sentence, including word segmentation and removal of stop words.
2.2.2. Answer Sentence Extraction Based on Frame Matching
Since the method based on similarity measure cannot mine the deep semantic information of the sentences in Gaokao, this paper uses the Chinese Frame Network (CFN) [26] to capture the semantic information in the semantic scene. CFN is a Chinese vocabulary semantic knowledge base established by Shanxi University; it is based on FrameNet [27] of the University of California, Berkeley.
(1) Frame semantic matching: when the frame evoked by the target word of the question
An example of candidate sentence extraction based on frame matching is shown in Figure 3. The frame aroused by the target word “development” in question is the same as that aroused by the target word “enhance” in the candidate sentence; there is a relationship between the frame aroused by the target word “development” in the question and the frame aroused by the target word “carry out” in the candidate sentence. The involved scenes are relevant and the distance is less than or equal to 2. Therefore, the sentence S is extracted as an answer candidate sentence based on frame matching.
[figure omitted; refer to PDF]
It can be seen from Figure 6 that our method can improve the experimental effect on real and simulated questions in different provinces. At the same time, it can be found that the recall of some provinces is relatively low, such as Example 2.
Example 2.
2009 Liaoning College Entrance Examination questions.
Question: “通俗历史热”在当今出现的原因是什么?
What is the reason for the emergence of “popular history fever” today?
Answer: “通俗历史热”是商品经济和文化教育发展到一定程度后定会出现的一种现象。
当商品经济趋于发达、文化教育发展迅速的时候, 人们在从事赖以谋生的职业活动之外, 带有文化色彩的业余需求会随之增长, 对作为文化存在常见形态之一的历史知识, 其“求解”欲望也会趋于强烈。
在当今市场经济逐步成熟、文化教育普及程度大为提高、高等教育开始走向大众化的时代, 人们的业余文化需求显著增长, 久远的尘封旧事引起了人们日益浓厚的兴趣。
对于广大民众而言, 在古奥难懂的传统史著和“学术模式”的现代史书皆难“卒读”的情况下, 通俗化的历史几乎成为他们“探寻过去”的唯一选择。
“Popular history fever” is a phenomenon that will surely appear after the development of the commodity economy and cultural education to a certain extent.
When the commodity economy tends to develop and cultural education develops rapidly, in addition to the professional activities that people rely on to make a living, the demand for culturally colored amateurs will increase accordingly. For historical knowledge as one of the common forms of cultural existence, its desire to “solve” will also become stronger.
In today’s era, when the market economy is gradually maturing, the popularity of cultural education has greatly increased, higher education has begun to become popular, people’s amateur cultural needs have increased significantly, and the dusty old things have aroused people’s growing interest.
For the general public, under the circumstances that traditional historical books are difficult to understand in ancient times and modern history books of “academic mode” are difficult to “read,” popularized history has almost become their only choice for “exploring the past.”
Analyze the reasons and find the following: (1) the background material is discussed through the concept of “popular history fever” and many candidate sentences related to the question are not answer sentences, which need deep semantic understanding and reasoning technology. (2) It is found that there is a big semantic gap between “原因” in the question and the words such as “desire,” “demand,” “interest,” and “choice” in the answer sentence. It is difficult for us to make semantic matching with existing tools such as HowNet, Word2Vector, and CFN.
The accuracy of extracting paragraph topic sentence and author’s opinion sentence.
Annotate the paragraph topic sentences and author’s opinion sentences on the Beijing 12 years College Entrance Examination. There are 19 materials, 89 paragraph topic sentences, and 26 author’s opinion sentences. The experimental results are shown in Table 5.
Through the analysis of College Entrance Examination papers, it is found that, compared with the general news articles, it is more difficult to extract the topic sentence of the paragraph. As shown in Example 3, the topic sentence of the paragraph is “Singing Kunqu Opera is something in the hall” which is a concise summary of the paragraph. However, the similarity between topic sentences and other sentences is small, so it needs deeper semantic reasoning technology. The difficulty of extracting the author’s opinion sentences is that some articles do not have a clear author’s opinion. As shown in Example 3, the full text consists of four paragraphs. The first paragraph introduces “Kunqu Opera,” and the next three paragraphs illustrate the strengths and limitations of “Kunqu Opera” from different perspectives, but there is no obvious general view and attitude.
Table 5
Experimental results of the paragraph topic sentence and author’s opinion sentence.
| Method | P (%) |
| Paragraph topic sentence recognition | 80.62 |
| Author’s opinion sentence recognition | 75.00 |
Example 3.
2009 Beijing College Entrance Examination
演唱昆曲是厅堂里的事情。地上铺了一方红地毯, 就算是剧中的境界, 唱的时候, 笛子是主要的乐器, 声音当然不会怎么响, 但是在一个厅堂里, 也就各处听得见了。搬上旧式的戏台去, 即使在一个并不宽广的戏院子里, 就不及平剧那样容易叫全体观众听清。如果搬上新式的舞台去, 那简直没有法子听, 大概坐在第五六排的人就只看见演员拂袖按鬓了。
Singing Kunqu Opera is something in the hall. There is a red carpet on the ground, even if it is the realm in the play; when singing, the flute is the main instrument, of course, the sound is not very loud, but in a hall, it can be heard everywhere. Moving on to an old-style theater, even in a theater that is not as wide as a theater, it is not as easy for the entire audience to hear. If you go to a new style stage, there is simply no way to listen. Perhaps the people sitting in the fifth and sixth rows will only see the actor’s sleeves and temples.
4. Conclusion
After showing Gaokao’s difficulty and its difference from the existing research problems, we propose a new framework for reading comprehension QA in Gaokao. The method first uses word similarity matching, frame matching, and discourse topic to construct the affinity matrix, which includes not only the relationship between the question and candidate sentences, but also the relationship between candidate sentences and then uses a graph-based algorithm to calculate the score of each sentence. Finally, the top 6 sentences are chosen as the answer sentences. At present, the deep reasoning ability of our method is not strong enough. In addition, the method in this article is extractive and cannot automatically generate some answers, so the score rate of the system is not high. In the next step, we will conduct a deep semantic understanding and reasoning on the background article and study a more efficient method. At the same time, we will further collect the relevant corpus to expand the scale of data and improve the answering effect of the system.
Acknowledgments
This work was supported in part by the National Key R & D Projects (2018YFB1005103), the National Natural Science Foundation of China (61772324), and the 1331 Engineering Project of Shanxi Province of China.
[1] S. Guo, X. Zeng, S. He, K. Liu, J. Zhao, "Which is the effective way for Gaokao: information retrieval or neural networks?," Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics EACL, pp. 111-120, .
[2] C. Gong, W. Zhu, Z. Wang, J. Chen, Y. Qu, "Taking up the Gaokao challenge: an information retrieval approach," Proceedings of the 2016 International Joint Conference on Artificial Intelligence IJCAI, pp. 2479-2485, .
[3] A. Fujita, A. Kameda, K. Ai, Y. Miyao, "Overview of Todai robot project and evaluation framework of its Nlp-based problem solving," Proceedings of the International Conference on Learning Representations ICLR, pp. 2590-2597, .
[4] M. Feng, B. Xiang, M. R. Glass, L. Wang, B. Zhou, "Applying deep learning to answer selection: a study and an open task," Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding ASRU, pp. 813-820, DOI: 10.1109/ASRU.2015.7404872, .
[5] J. Chen, Qi Zhang, P. Liu, X. Qiu, X. Huang, "Implicit discourse relation detection via a deep architecture with gated relevance network," Proceedings of the ACL, pp. 1726-1735, .
[6] L. Qin, Z. Zhang, H. Zhao, Z. Hu, E. P. Xing, "Adversarial connective-exploiting networks for implicit discourse relation classification," Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), pp. 1006-1017, .
[7] J. Cai, S. He, Z. Li, H. Zhao, "A full end-to-end semantic role labeler, syntacticagnostic or syntactic-aware?," Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), .
[8] P. Rajpurkar, "SQuAD: 100,000+ questions for machine comprehension of text," pp. 2383-2392, .
[9] W. He, "DuReader: a Chinese machine reading comprehension dataset from real-world applications," Proceedings of the MRQA 2018, pp. 37-46, .
[10] Y. Cui, "A span-extraction dataset for Chinese machine reading comprehension," pp. 5883-5889, .
[11] D. Xiong, "A similarity calculation method of community question and answer based on LDA," Journal of Chinese Information Processing, vol. 26 no. 5, pp. 40-46, 2012.
[12] Z. Ye, "Research on open domain question answering system," pp. 527-540, .
[13] L. T. Le, C. Shah, E. Choi, "Assessing the quality of answers autonomously in community question–answering," International Journal on Digital Libraries, vol. 20 no. 4, pp. 351-367, DOI: 10.1007/s00799-019-00272-5, 2019.
[14] C. Li, "Syntactic analysis and deep neural network in answer extraction of Chinese question answering system," Journal of Chinese Mini-Micro Computer Systems, vol. 38 no. 6, pp. 1341-1346, 2017.
[15] Q. Liu, "Semantic similarity of vocabulary based on HowNet," International Journal of Computational Linguistics & Chinese Language Processing, vol. 7 no. 2, pp. 59-76, 2002.
[16] W. T. Yih, "Question answering using enhanced lexical semantic models," pp. 1744-1753, .
[17] Y. Zhou, "A method of sentence semantic similarity based on synonym forest and its application in question answering system," Computer Applications and Software, vol. 36 no. 8, pp. 65-68, 2019.
[18] S. Guo, "Sentence semantic relevance for college entrance examination reading comprehension," Journal of Tsinghua University (Science and Technology), vol. 57 no. 6, pp. 575-579, 2017.
[19] Y. Guan, "A study on the selection of text titles for Chinese reading comprehension in college entrance examination," Journal of Chinese Information Processing, vol. 32 no. 6, pp. 28-35, 2018.
[20] G. Li, "The extraction of answer sentences from Chinese reading comprehension of college entrance examination based on frame semantics," Journal of Chinese Information Processing, vol. 30 no. 6, pp. 164-172, 2016.
[21] L. Page, "The PageRank citation ranking: bringing order to the web," 1999. Technical Report
[22] C. Fan, Research on PageRank Algorithm in Web Structure Mining, 2009.
[23] J. Liu, "Keyword extraction based on language network," pp. 711-715, .
[24] X. Wan, "An exploration of document impact on graph-based multi-document summarization," .
[25] M. Liu, Sentence Similarity Calculation Based on Word Vector and Its Application in Case-Based Machine Translation, 2015.
[26] R. Li, Research on the Semantic Structure Analysis Technology of Chinese Sentence Frame, 2012.
[27] C. F. Baker, "The berkeley framenet project," pp. 86-90, .
[28] HIT IR-Lab Tongyici Cilin (Extended), http://www.ir-lab.org/
[29] W. Che, "Ltp: a Chinese language technology platform," pp. 13-16, .
[30] J. Devlin, "BERT: pre-training of deep bidirectional transformers for language understanding," pp. 4171-4186, .
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2020 Zhizhuo Yang et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/
Abstract
Reading comprehension Question-Answering (QA) for College Entrance Examination (Gaokao in Chinese) is a challenging AI task because it requires effective representation to capture complicated semantic relations between the question and answers. In this paper, a novel method of Chinese Automatic Question-Answering based on a graph is proposed. The method first uses the Chinese FrameNet and discourse topic (paragraph topic sentence and author’s opinion sentence) to construct the affinity matrix between the question and candidate sentences and then employs the algorithm based on the graph to iteratively calculate the importance of each sentence. At last, the top 6 candidate answer sentences are selected based on the ranking scores. The recall on Beijing College Entrance Examination in the recent twelve years is 67.86%, which verifies the effectiveness of the method.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
; Li, Chunzhuan 1
; Zhang, Hu 1
; Qian Yili 1
; Li, Ru 2 1 School of Computer and Information Technology of Shanxi University, Taiyuan, Shanxi 030006, China
2 School of Computer and Information Technology of Shanxi University, Taiyuan, Shanxi 030006, China; Key Laboratory of Computation Intelligence and Chinese Information Processing, Taiyuan, Shanxi 030006, China





