Content area
Featured Application
Knowledge extraction technology can be applied to many scenarios, such as factual knowledge graph construction. By using knowledge extraction technology, we can extract named entities, their relationships, attributes and concepts from massive unstructured text data. These factual knowledge graphs can be used in search engines and recommendation systems.Meanwhile, knowledge extraction technology can also be used to build a vertical domain knowledge graph, such as medical knowledge graph, which extracts symptoms, diseases, drugs, surgery, treatment methods, etc.Through knowledge extraction technology, a large amount of medical knowledge can be extracted from the medical literature and electronic medical records to assist doctors in disease diagnosis in the CDSS system. Knowledge extraction technology can also be used to analyze the dialogue between patients and doctors, extract the dialogue information of patients from medical dialogue, provide intelligent pre-consultation and guidance services and generate electronic medical records. In the dialogue system, knowledge extraction technology can be used to understand the user’s query intention and slot extraction, such as ticket booking scenario; knowledge extraction technology can extract customer booking information, such as departure city, arrival city, time, preference, etc.
AbstractIn the actual knowledge extraction system, different applications have different entity classes and relationship schema, so the generalization and migration ability of knowledge extraction are very important. By training a knowledge extraction model in the source domain and applying the model to an arbitrary target domain directly, open domain knowledge extraction technology becomes crucial to mitigate the generalization and migration ability issues. Traditional knowledge extraction models cannot be directly transferred to new domains and also cannot extract undefined relation types. In order to deal with the above issues, in this paper, we proposed an end-to-end Chinese open-domain knowledge extraction model, TPORE (Extract Open-domain Relations through Token Pair linking), which combined BERT with a handshaking tagging scheme. TPORE can alleviate the nested entities and nested relations issues. Additionally, a new loss function that conducts a pairwise comparison of target category score and non-target category score to automatically balance the weight was adopted, and the experiment results indicate that the loss function can bring speed and performance improvements. The extensive experiments demonstrate that the proposed method can significantly surpass strong baselines. Specifically, our approach can achieve new state-of-the-art Chinese open Relation Extraction (ORE) benchmarks (COER and SAOKE). In the COER dataset, F1 increased from 66.36% to 79.63%, and in the SpanSAOKE dataset, F1 increased from 46.0% to 54.91%. In the medical domain, our method can obtain close performance compared with the SOTA method in the CMeIE and CMeEE datasets.
Details
; Pun, Sio Hang 2 ; Vai, Mang I 3 ; Yang, Yifan 4 ; Miao, Qingliang 4 1 Department of Electrical and Computer Engineering, Faculty of Science and Technology, University of Macau, Macau 999078, China;
2 State Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau, Macau 999078, China
3 Department of Electrical and Computer Engineering, Faculty of Science and Technology, University of Macau, Macau 999078, China;
4 AI Speech Co., Ltd., Building 14, Tengfei Science and Technology Park, No. 388, Xinping Street, Suzhou Industrial Park, Suzhou 215000, China;