Content area

Abstract

Relation extraction is a fundamental task in natural language processing that aims to identify structured triple relationships from unstructured text. In recent years, research on relation extraction has gradually advanced from the sentence level to the document level. Most existing document-level relation extraction (DocRE) models are fully supervised and their performance is limited by the dataset quality. However, existing DocRE datasets suffer from annotation omission, making fully supervised models unsuitable for real-world scenarios. To address this issue, we propose the DocRE method based on uncertainty pseudo-label selection. This method first trains a teacher model to annotate pseudo-labels for a dataset with incomplete annotations, trains a student model on the dataset with annotated pseudo-labels, and uses the trained student model to predict relations on the test set. To mitigate the confirmation bias problem in pseudo-label methods, we performed adversarial training on the teacher model and calculated the uncertainty of the model output to supervise the generation of pseudo-labels. In addition, to address the hard-easy sample imbalance problem, we propose an adaptive hard-sample focal loss. This loss can guide the model to reduce attention to easy-to-classify samples and outliers and to pay more attention to hard-to-classify samples. We conducted experiments on two public datasets, and the results proved the effectiveness of our method.

Details

Title
Document-Level Relation Extraction with Uncertainty Pseudo-Label Selection and Hard-Sample Focal Loss
Author
Wang, Hongbin 1 ; Yu, Shuning 1 ; Yantuan, Xian 1 

 Faculty of Information Engineering and Automation, Kunming University of Science and Technology 727 Jingmingnan Road, Kunming, Yunnan 650500, China [email protected] 
Volume
28
Issue
2
Pages
361-370
Publication year
2024
Publication date
Mar 2024
Publisher
Fuji Technology Press Co. Ltd.
Place of publication
Tokyo
Country of publication
Japan
Publication subject
ISSN
13430130
e-ISSN
18838014
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2024-03-20
Milestone dates
2023-08-05 (Received); 2023-10-31 (Accepted)
Publication history
 
 
   First posting date
20 Mar 2024
ProQuest document ID
2967064441
Document URL
https://www.proquest.com/scholarly-journals/document-level-relation-extraction-with/docview/2967064441/se-2?accountid=208611
Copyright
Copyright © 2024 Fuji Technology Press Ltd.
Last updated
2024-08-26
Database
ProQuest One Academic