1 Introduction
In recent years, large language models (LLMs) have achieved significant breakthroughs across various tasks in natural language processing (NLP)[1–4]. Information extraction (IE) is one of the NLP tasks, aiming to extract explicit and structured knowledge from text. But some recent work indicates that LLMs’ performance has a significant gap in the IE [5–8]. In order to improve the performance of LLM on IE tasks. The Unified Information Extraction (UIE) [9] proposes a schema-based prompt mechanism that effectively facilitates the extraction of structured knowledge, including advanced entities, relationships, and events. Earlier models typically required the design of task-specific architectures for different information extraction tasks. However, UIE employs a single model that supports multiple tasks, such as named entity recognition (NER), relation extraction (RE), and event extraction (EE). Subsequent research has focused on schema design and dataset construction, demonstrating the effectiveness of employing schemas in information extraction tasks [10].
Current studies often simplify schema construction, which may adversely affect the performance of information extraction models. A common approach involves concatenating all schemas from the dataset to create a training set, allowing the inference process to utilize any subset of these schemas. This inconsistency between training and inference can undermine the model’s reasoning capabilities, while employing all schemas during inference is also impractical. To tackle this challenge, we propose a chunk-based schema instruction construction strategy.
Utilizing the state-of-the-art LLMs, Qwen2-7B-Chat [11] and Llama3.1-8B-Chat [12], we employ the LoRA [13] technique to conduct experiments on multiple Chinese and English datasets for NER, RE, and EE tasks to validate the performance improvements of our proposed method under zero-shot conditions. Experimental results show that discrepancies in schema distributions during training and inference can negatively impact model performance. Additionally, ablation studies indicate that focusing on challenging negative schemas enhances the model’s ability to distinguish similar patterns.
The main contributions of this study are summarized as follows:
1. We propose a chunk-based schema instruction construction method. This approach divides schemas using a carefully designed sampling strategy based on an explicitly specified chunk size, ensuring that the schema distributions during training and inference are as similar as possible. This alignment helps mitigate the adverse effects on model performance resulting from significant differences in schema patterns. Furthermore, our method constructs the training set in a comprehensive and explicit manner, enriching the variety of schema patterns included.
2. We introduce a predefined dictionary of challenging schemas. The schema segmentation process prioritizes the co-occurrence of schemas corresponding to positive label with those of challenging negative labels, thereby enhancing the model’s ability to discern similar semantics and reducing confusion over similar patterns.
3. Extensive experiments conducted on publicly availableChinese and English datasets demonstrate that our proposed method significantly enhances the zero-shot performance of existing LLMs in NER, EE, and RE tasks.
The paper is organized as follows: Sect 2 reviews and discusses related work. Sect 3 presents the design details of the ChunkUIE model proposed in this paper. Sect 4 describes the experimental program, including the dataset, evaluation metrics, and implementation details, quantitative evaluation results, and analysis. Sect 5 concludes.
2 Related works
2.1 Information extraction models
Large language models have demonstrated significant performance improvements across various language tasks, with some models capable of supporting multiple tasks simultaneously. In the domain of information extraction, these models also hold promise for addressing the generalization challenges associated with unseen labels. Lu’s study [9] seeks to adaptively adjust the structure and requirements of information extraction by incorporating schemas into instructions, resulting in a versatile model that has become a mainstream approach in subsequent UIE research.
Fine-tuning open-source LLMs is currently a prevalent method in the information extraction field. For instance, InstructUIE [10] employs the 11B FlanT5 [14] as its backbone, achieving performance comparable to ChatGPT-3.5 [15] through instruction-based fine-tuning. Similarly, YAYI-UIE [16] utilizes the Baichuan2-13B [17] backbone model, enhancing performance on NER, RE, and EE tasks through dialog-augmented instruction fine-tuning. Research by USM [18] validates that smaller LLMs can also perform well on information extraction tasks, indicating that the interplay between task complexity and model size warrants further investigation.
Several studies have also explored improving model performance through prompt learning or synthetic data generation. However, the aforementioned research has largely overlooked the potential discrepancies between the instructions used during training and those encountered in practical applications, which may impair model performance. To tackle this issue, we propose a random chunk schema construction method and validate its effectiveness through experimental results.
2.2 Information extraction datasets
Large-scale pre-trained corpora are essential for the effectiveness of LLMs, as they provide a wealth of knowledge and serve as a foundation for language comprehension. Although annotated datasets for information extraction tasks are relatively abundant, there is considerable variation in their labeling patterns. Several studies have addressed this issue by focusing on aspects such as size, distribution, and schema standardization [19–22], including initiatives like InstructIE [22], IEPILE [23], KnowCoder [24], etc.
In accordance with existing research practices, we sampled datasets from the domains of named entity NER, RE, and EE in both Chinese and English. Then clean the sampled data to remove duplicates and low-quality samples. The cleaned datasets were subsequently mixed, and a chunk-based schema dataset construction strategy was employed to build the training set. This training set was utilized to finetune the Qwen2-7B-Chat [11] and Llama3.1-8B-Chat [12] models. For evaluation, we used a comprehensive test set consistent with existing research to ensure a fair assessment.
3 Method
Fig 1 illustrates a schematic representation of the method proposed in this paper, using the NER task as an example. The process begins with the collection and cleaning of datasets for the NER, RE, and EE tasks, followed by the integration of the cleaned datasets. The annotations present in the text are treated as positive labels, while the predefined hard schema dictionary is used to derive hard and easy negative labels. Hard negative labels refer to labels that are semantically similar to positive labels or that have similar compositions of label words. Hard and easy judgments are defined by humans. For example, in Fig 1, the labels “administrative division of country” and “country administrative divisions” may pose challenges for determining location, and are therefore classified as difficult. We propose a carefully designed label mixing strategy along with a chunk-splitting approach to construct the final schema. This method significantly enhances the quality of the processed data and addresses the inconsistency in the number of schemas between the training and evaluation phases. The following contents will provide a detailed overview of the data processing methods and the generation of chunked instructions.
[Figure omitted. See PDF.]
To uniformly model the IE tasks, including NER, RE and EE, we formalize these tasks by the following Eq (1):
(1)
where Instruction comprises a natural language text sequence that includes three key elements: task type, task option, and output format. It provides a description of the task type to clearly specify the task, a description of the task option to delineate the range of labels for the output, and a description of the desired output format. The input I consists of a textual instance of the information extraction tasks, which is presented to the large language model alongside the instruction. The model then generates the output based on the constraints outlined in the instruction. The output O is a sentence that represents the structured information extracted from the input text. ChunkUIE framework employs JSON as the output format for all IE tasks.
3.1 Data composition and cleaning
Data Collection. To meet the demands of various fields and practical applications, this paper focuses on the tasks of NER, RE, and EE within information extraction. We have collected datasets from multiple sources, resulting in a curated collection that includes both Chinese and English corpora. We express our gratitude for existing high-quality research in information extraction datasets, such as UIE [9], IEINSTRUCTIONS [25], and YAYI-UIE [16]. The dataset presented in this paper is meticulously crafted based on our proposed strategy, further refining the collected datasets. Previous research has demonstrated that the quality of the dataset significantly impacts the performance of supervised fine-tuning [22]. The dataset is shown in Fig 2.
[Figure omitted. See PDF.]
Specifically, the NER task includes 13 datasets, comprising 10 English datasets. The 10 English datasets are CoNLL2003 [20], Ontonotes [26], MultiNERD [27], MIT Movie [28], HarveyNER [29], GENIA [30], BC2GM [31], BC5CDR [32], BC4CHEMD [31], and AnatEM [33]. Additionally, 3 Chinese datasets are MSRA [34], Resume NER [35], and CLUE NER [36]. The RE task encompasses 9 datasets, of which 7 are English: SemevalRE [37], SciERC [21], CoNLL2004 [38], NYT [39], KBP37 [40], GIDS [41], and ADE Corpus [42]. The two Chinese datasets for this task are DUIE2.0 [37] and CMeIE [43]. For the EE task, there are 4 datasets. Two English datasets are PHEE [44] and CASIE [45]. The other two Chinese datasets are DuEE1.0 [46] and DuEE-fin [47]. These datasets cover multiple domains, including general, medical, financial, social media, news, resume, and scientific fields.
Data Cleaning. First, to address the issue of duplicate samples, we implemented a deduplication process based on the text’s duplication rate. The deduplication rules are as follows:
1. Drop text with more than 70% repeated characters.
2. Drop texts shorter than ten characters without any labels.
3. Drop texts containing a high prevalence of stopwords (e.g., “the”, “of”) exceeding 80%.
After deduplication, we found that some samples in the processed dataset still exhibited poor quality. To mitigate this, we manually filtered out samples that were deemed too low quality. Inspired by InstructUIE, we aimed to map the labels from different datasets into a unified schema as much as possible. This process contributes to the further enhancement of data quality, which we believe will be beneficial for model training and fine-tuning. Inspired by the research of Zhiqiang Hu et al., we utilize a subset of 132K samples to fine-tune the 7 billion parameter LLM. The implementation involves independent stratified sampling, where each minimum unit of the dataset is defined as a layer, denoted as . So the subset of NER task can be defined as Eq (2).
(2)
where k is the number of NER datasets, and is the dataset one of NER datasets, such as CoNLL2003 and Ontonotes. 0.2 represents randomly selecting 20% from the dataset without duplication. Similarly, define the EE and RE subsets in a similar way:
(3)(4)
Then the final dataset is constructed by combining the NER, EE, and RE datasets, as shown in Eq (5).
(5)
Independent stratified sampling enhances the representativeness of samples by ensuring that each layer accurately reflects the characteristics of its corresponding population segment. Furthermore, this method guarantees that the number of samples in each layer is proportional to the population, thereby minimizing sampling errors.
3.2 Chunked instruction generation
Instructions are crucial for IE task, and three key components are essential within following instructions. (1) Task Description. This specifies the particular category of the IE task. (2) Source Text. This is the text from which the information is to be extracted. (3) Schema. This outlines the specific information to be extracted, such as entities, relationships, and events. Among these three components, the schema is particularly critical due to its flexibility and its role in guiding the model towards the specific information required for the task. Consequently, constructing an appropriate schema is vital for enhancing the robustness of model performance.
The following sections will provide a detailed overview of the processes for constructing positive and negative schemas, as well as the methodology for creating chunked instructions.
3.2.1 Postive and negative schema construction.
The UIE introduces the concept of integrating a schema sequence, referred to as the structural schema instructor (SSI). Many studies have followed the SSI approach. However, this method typically utilizes all predefined labels from the dataset as SSI during the training phase. This results in a large number of labels being included in the training process, which can weaken the model’s ability to distinguish between similar but semantically different labels when faced with complex samples. Additionally, inconsistencies in the number and distribution of schemas specified during evaluation compared to training can severely harm model performance. For example, using 20 schemas during training but only 5 or 10 during evaluation creates a mismatch.
To tackle this issues, fine-tuning the number and composition of labels in the introduction during training is valuable. In this study, the schema sequence continues to follow the SSI concept. However, unlike existing research that commonly employs all labels to construct instructions, we generate stable instructions with a balanced number of positive and negative labels through the use of hard negative labels and chunked schemas. We define positive and negative labels as illustrated in Fig 1. The set of all predefined labels for a dataset comprises the collection L. For instance, the schema “location contains” present in the annotations serves as a positive schema, while all other schemas from the predefined label set L are classified as negative schemas. Inspired by contrastive learning, for a given text T, the schemas present in its annotations form the positive schema set , while the remaining schemas constitute the negative label set .
To further enhance the model’s ability to distinguish between easily confused labels, we construct a hard schema dictionary D, which distinguishes negative labels into and . Hard negative labels refer to labels that are semantically similar to positive labels or that have similar compositions of label words. and are described by Eq (6) and (7), respectively.
(6)(7)
Thus, the final consists of the entire and a subset of . The benefit of this approach is that it maintains the model’s ability to differentiate between easily confused labels while reducing the number of highly similar samples, ultimately improving training efficiency. The examples of hard negative are as follows:
3.2.2 Chunked instruction construction.
We introduce a chunked instruction construction method aimed at aligning the number of schemas during training and inference as closely as possible. Specifically, this is achieved by utilizing a dynamically adjustable parameter, , to construct multiple chunked instructions from the set L. Consequently, the set will be divided into N chunks for querying, with each chunk querying no more than schemas. The total number of resulting instructions can be described by the following Eq (8):
(8)
where N is the number of chunks, is the ceiling symbol and is a integer. represents the division of the all labels into non-overlapping chunks. SAMPLE denotes the random sampling of labels from the original set L to form a chunk, ensuring that each label has an equal probability of selection. This approach mitigates the issue of prioritizing the grouping of and , which could lead to performance degradation when both appear during the evaluation of the model.
The is a tuning parameter that should be chosen from the range . However, setting it to 1 would result in an excessively large number of training samples, adversely affecting model performance. To enhance the efficiency of both training and usage, can be set as a common divisor of the schema counts across multiple datasets. In this study, is set to 4. To further improve the model’s robustness, if the number of schemas in the last chunk is less than half of , it is merged with the previous chunk. Otherwise, it is retained as an independent chunk.
The Sects 3.2.1 and 3.2.2 can be summarized as Algorithm 1:
Algorithm 1 Chunked instruction generation.
For instance, if a dataset has a total of 48 schemas and a given of 4, traditional methods would generate 12 unique, non-repeating instructions. However, by employing the subsets of positive labels, hard negatives, and easy negatives, this requirement can be significantly reduced to just 4 instructions. The instruction format used in chunkUIE resembles JSON strings, essentially forming a dictionary-style structure. This format comprises three main components. (1) “instruction”, which provides a task description outlining the objective of the instruction. (2) “schema”, a list of labels that need to be extracted. (3) “source text”, which is the text from which the information is to be extracted. Examples of instructions corresponding to various tasks are provided as follows Fig 3.
[Figure omitted. See PDF.]
The input and output formats adopt a structure similar to JSON strings.
After constructing the chunked instructions, supervised fine-tuning technology is used to train the existing chatLLM for the IE task, which can be described by the Eq (9):
(9)
where is the fine-tuned universal information extraction model. is the fin-tuned chat model, such as the qwen2-7B model. T is the information extraction corpus.
4 Experiments
In this section, we provide a detailed illustration of the datasets, evaluation metrics, models, experimental parameters, and results utilized in this study. Through comparative and ablation experiments, we demonstrate that the proposed method enhances the zero-shot performance of LLMs on NER, RE, and EE tasks across multiple datasets.
4.1 Datasets and evaluation metric
Training Datasets. For the NER task, we utilize 13 datasets, which include 10 English datasets: CoNLL2003 [20], Ontonotes [26], MultiNERD [27], MIT Movie [28], HarveyNER [29], GENIA [30], BC2GM [31], BC5CDR [32], BC4CHEMD [31], and AnatEM [33]. Additionally, there are 3 Chinese datasets: MSRA [34], Resume NER [35], and CLUE NER [36]. The RE task encompasses 9 datasets. Seven English datasets are SemevalRE [37], SciERC [21], CoNLL2004 [38], NYT [39], KBP37 [40], GIDS [41], and ADE Corpus [42], Two Chinese datasets are DUIE2.0 [37] and CMeIE [43]. For the EE task, there are 4 datasets. Two English datasets are PHEE [44] and CASIE [45]. Two Chinese datasets are DuEE1.0 [46] and DuEE-fin [47]. These datasets cover a wide range of domains, including general, medical, financial, social media, news, resumes, and scientific fields. According to the method outlined in Sect 3, the datasets are processed and mixed, resulting in a total of 132715 sentences. During the training process, 26543 of them are used as test sets, and the rest are used as training sets. The task distribution, domain, schemas,language, and sample number of the train dataset are shown in Table 1.
[Figure omitted. See PDF.]
Testing Datasets. NER task utilizes CrossNER [48], Boson (https://github.com/InsaneLife/), and Weibo [49] datasets. The experimental results of the CrossNER dataset are the average of the five subsets. EE task utilizes WikiEvents [50], RAMS [51], CrudeOil [52], FewFC [53], and CCFLaw (https://aistudio.baidu.com/projectdetail/4201483) datasets. RE task utilizes FewRel [54], SKE2020 (https://aistudio.baidu.com/datasetdetail/177191), COAE2016 (https://github.com/Sewens/COAE2016), and IPRE [55] datasets. The task distribution, domain, schemas,language, and sample number of the test dataset are shown in Table 2.
[Figure omitted. See PDF.]
Evaluation Metrics. We utilize span-based Micro-F1 as the primary metric for evaluating model performance. The Micro-F1 can be defined as Eq (12). For the NER task, the model must accurately identify both the boundaries of entities and their corresponding types. In the RE task, it is essential for the model to precisely determine the subject and object entities within a relation, along with the type of relation between them. For the EE task, we independently match event triggers, referred to as Trigger, and their associated arguments, denoted as Argument.
(10)(11)(12)
TPi is the true positives in the i-th samples. FPi is the false positives in the i-th samples. FNi is the false negatives in the i-th samples.
4.2 Models and settings
To evaluate the zero-shot generalization capabilities, we selected several prominent models for comparative analysis:
1. UIE [9]: A unified text-to-structure generation framework capable of modeling various IE tasks generically.
2. LLaMA2 [56]: A series of LLMs ranging from 7 billion to 70 billion parameters.
3. Baichuan2 [17]: A collection of multilingual LLMs available in 7 billion and 13 billion parameter configurations.
4. Qwen1.5 [57]: A comprehensive series of language models that includes distinct models with varying parameter counts.
5. Qwen2 [11]: A newer comprehensive series of language models that includes distinct models with varying parameter counts compared to Qwen1.5.
6. Mistral [58]: A 7-billion-parameter LLM designed for efficient performance.
7. ChatGPT [15]: Also known as GPT-3.5-turbo, this model represents the most advanced artificial intelligence language model to date, optimized for conversational applications.
8. LLaMA3.1 [12]: The latest release in the LLaMA model series, achieving significant improvements across various benchmarks.
9. InstructUIE [10]: A unified IE framework based on multi-task instruction tuning.
10. YAYI-UIE [16]: An end-to-end universal information extraction framework that supports both Chinese and English.
We employ the LoRA [13] technique to accomplish instruction tuning. In LoRA fine-tuning, the low-rank adjustment of the original weight matrix W is accomplished by training two smaller matrices, A and B, within the target module. LoRA enhances the model by incorporating two sequential low-rank matrices to approximate the residual weights. The forward computation of the adapted module is expressed as Eq (13).
(13)
where , , and . The rank r of LoRA is typically much smaller than the dimensions of the original model, allowing for rapid fine-tuning with only a minimal increase in model weights, usually between 0.1% and 1%. This approach ensures low storage and memory requirements. LoRA is broadly applicable, particularly in dense projection layers of transformer architectures. By employing this technique, LoRA enhances the efficiency and cost-effectiveness of the fine-tuning process while preserving the performance of the original model. This method is especially advantageous for large language models that require frequent updates or adjustments for specific tasks.
The experiments were conducted on a platform equipped with 2 Intel Xeon E5-2683 CPU and 4 NVIDIA GeForce RTX 2080Ti GPUs. Detailed configurations of the hyperparameters during the fine-tuning process are summarized in Table 3.
[Figure omitted. See PDF.]
4.3 Main experimental results and analysis
Tables 4 and 5 present the F1 zero-shot performance across three tasks and two languages. Notably, the tested models include those with 7 billion and 13 billion parameters, as well as closed-source models like ChatGPT. Overall, our proposed method demonstrates a significant performance improvement for the 7 billion parameter open-source models after supervised fine-tuning, achieving performance that is comparable to, or even better than, ChatGPT.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
From Table 4, it is evident that for the NER task on English datasets, ChunkUIE-Llama3.1-8B ranks second in performance, showing only a small gap compared to ChatGPT. Both ChunkUIE-Llama3.1-8B and ChunkUIE-Qwen2-7B significantly outperform most other models in English NER tasks. This confirms the effectiveness of our proposed method and highlights the value of incorporating a larger proportion of NER training data in the training dataset. In the EE task, ChunkUIE-Llama3.1-8B and ChunkUIE-Qwen2-7B demonstrate significantly better performance than models that were not fine-tuned on IE tasks, which can be attributed to the effectiveness of supervised fine-tuning. Although the performance on the WikiEvents and RAMS datasets slightly lags behind that of InstructUIE, this may be due to the advantages of using larger models. Nonetheless, the average performance in the EE task still favors ChunkUIE-Llama3.1-8B and ChunkUIE-Qwen2-7B. For the RE task, both ChunkUIE-Llama3.1-8B and ChunkUIE-Qwen2-7B outperform the models that were not fine-tuned, demonstrating the efficacy of fine-tuning. However, their performance is inferior to that of InstructUIE and YAYI-UIE, which may be influenced by the benefits of larger models.
From Table 5, it is evident that for the NER task on Chinese datasets, ChunkUIE-Qwen2-7B achieves the best average performance, while ChunkUIE-Llama3.1-8B underperforms. This discrepancy may be attributed to differences in the distribution of the fine-tuning data used for the Chinese NER task in this study compared to the data on which the Qwen model was trained. In the EE task, both ChunkUIE-Llama3.1-8B and ChunkUIE-Qwen2-7B significantly outperform other models, further validating the effectiveness of our proposed method. For the RE task, both models also lead in performance compared to others, demonstrating the efficacy of our approach and the effectiveness of supervised fine-tuning.
Overall, ChunkUIE-Qwen2-7B and ChunkUIE-Llama3.1-8B exhibit strengths and weaknesses across the three tasks in both Chinese and English. However, ChunkUIE-Qwen2-7B shows a distinct advantage in Chinese tasks, aligning with the inherent strengths of the Qwen2 model in handling Chinese language data. Through the above quantitative evaluations, we confirm that our method enhances SFT performance on both LLaMA and Qwen models, resulting in models that are comparable to, or even outperform, ChatGPT in information extraction tasks. This further validates the effectiveness of our proposed approach.
4.4 Ablation study
To validate the impact of hard negative samples, we conducted ablation experiments. Specifically, we constructed without using the negative dictionary, instead treating all labels apart from as . The remainder of the process remained unchanged. Then rebuilt the training set and fine-tuned the Qwen2-7B and LLaMA3.1-8B models using the LoRA technique. A fair comparison was achieved by employing the same testing datasets. The experimental results are presented in Tables 6 and 7. The same metric is used to quantitatively validate our method’s effectiveness.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
From Tables 6 and 7, it is evident that the use of a hard negative dictionary enhances model performance across most datasets. Specifically, both ChunkUIE-Qwen2-7B and ChunkUIE-Llama3.1-8B benefit from hard negatives in the NER task, likely due to the clear boundaries characteristic of entity recognition. In the EE task, the performance improvements for ChunkUIE-Qwen2-7B and ChunkUIE-Llama3.1-8B are limited. However, there is a more pronounced enhancement in performance for the RE task.
To validate the effectiveness of ChunkUIE in fine-tuning the baseline model for information extraction tasks, we present ablation study results on English and Chinese datasets in Tables 8 and 9, respectively. As shown in Table 8, the Llama3.1-8b model achieves significant performance improvements in NER, EE, and RE tasks after fine-tuning with ChunkUIE. While the qwen2-7B model already demonstrates strong initial NER performance, it also achieves notable gains across all tasks following the application of ChunkUIE. As shown in Table 8, it can be seen that for the Chinese dataset, the performance of the model fine-tuned with chunkUIE is also significantly improved.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
The qwen and llama used in this study use a decoder-only design, which has stronger generation capabilities than the model using the encoder. The long sequence modeling capability of the model using only the decoder is very beneficial for information extraction tasks because the information to be extracted may be distributed in different sentences in a paragraph.
Overall, the introduction of contrastive learning principles significantly enhances the model’s ability to distinguish between easily confused terms, leading to improved performance across NER, RE, and EE tasks.
5 Conclusion
LLMs exhibit significant advantages in information extraction tasks. In this paper, we introduce ChunkUIE, which involves cleaning existing information extraction datasets and proposing a chunk-based random schema construction method. Additionally, by constructing challenging samples, we mitigate the model’s semantic confusion regarding similar patterns. Experiments on multiple datasets using supervised fine-tuning with LLaMA3.1-8B and Qwen2-7B validate the effectiveness of our proposed method in enhancing zero-shot performance for information extraction. Due to computational resource constraints, this study primarily investigates the fine-tuning effects of LLMs with approximately 7 billion parameters, leaving the specific performance on models of other sizes unclear. Exploring the data scalability of models of different sizes by gradually increasing the amount of data is one of the directions worth studying. Future work could focus on further enhancing dataset quality while investigating the information extraction capabilities of larger LLMs.
References
1. 1. Vilar D, Freitag M, Cherry C, Luo J, Ratnakar V, Foster G. Prompting PaLM for translation: assessing strategies and performance. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023. p. 15406–27. https://aclanthology.org/2023.acl-long.859
2. 2. Wu J, Gan W, Chen Z, Wan S, Philip SY. Multimodal large language models: a survey. In: 2023 IEEE International Conference on Big Data (BigData). 2023. p. 2247–56.
3. 3. Tang H, Zhu D, Tang W, Wang S, Wang Y, Wang L. Research on joint model relation extraction method based on entity mapping. PLoS One. 2024;19(2):e0298974. pmid:38394238
* View Article
* PubMed/NCBI
* Google Scholar
4. 4. Han D, Zheng Z, Zhao H, Feng S, Pang H. Span-based single-stage joint entity-relation extraction model. PLoS One. 2023;18(2):e0281055. pmid:36749758
* View Article
* PubMed/NCBI
* Google Scholar
5. 5. Li B, Fang G, Yang Y, Wang Q, Ye W, Zhao W. Evaluating ChatGPT’s information extraction capabilities: an assessment of performance, explainability, calibration, and faithfulness. arXiv preprint 2023. https://arxiv.org/abs/2304.11633
6. 6. Xu D, Chen W, Peng W, Zhang C, Xu T, Zhao X. Large language models for generative information extraction: a survey. arXiv preprint 2024. https://arxiv.org/abs/2312.17617
7. 7. Wang S, Sun X, Li X, Ouyang R, Wu F, Zhang T. GPT-NER: named entity recognition via large language models. arXiv preprint 2023. https://arxiv.org/abs/2304.10428
8. 8. Li P, Sun T, Tang Q, Yan H, Wu Y, Huang X, et al. CodeIE: Large code generation models are better few-shot information extractors. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. p. 15339–53. https://aclanthology.org/2023.acl-long.855
9. 9. Lu Y, Liu Q, Dai D, Xiao X, Lin H, Han X. Unified structure generation for universal information extraction. arXiv preprint 2022.
* View Article
* Google Scholar
10. 10. Wang X, Zhou W, Zu C, Xia H, Chen T, Zhang Y. Instructuie: multi-task instruction tuning for unified information extraction. arXiv preprint 2023. https://arxiv.org/abs/2304.08085
* View Article
* Google Scholar
11. 11. Yang A, Yang B, Hui B, Zheng B, Yu B, Zhou C. Qwen2 technical report. arXiv preprint 2024. https://arxiv.org/abs/240710671
* View Article
* Google Scholar
12. 12. Dubey A, Jauhri A, Pandey A, Kadian A, Al-Dahle A, Letman A, et al. The Llama 3 herd of models. arXiv preprint 2024. https://arxiv.org/abs/2407.21783
* View Article
* Google Scholar
13. 13. Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S. LoRA: low-rank adaptation of large language models. 2022. https://openreview.net/forum?id=nZeVKeeFYf9
* View Article
* Google Scholar
14. 14. Chung HW, Hou L, Longpre S, Zoph B, Tay Y, Fedus W. Scaling instruction-finetuned language models. J Mach Learn Res. 2024;25(70):1–53.
* View Article
* Google Scholar
15. 15. Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P. Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst. 2022;35:27730–44.
* View Article
* Google Scholar
16. 16. Xiao X, Wang Y, Xu N, Wang Y, Yang H, Wang M. YAYI-UIE: a chat-enhanced instruction tuning framework for universal information extraction. arXiv preprint 2024. https://arxiv.org/abs/2312.15548
* View Article
* Google Scholar
17. 17. Yang A, Xiao B, Wang B, Zhang B, Bian C, Yin C, et al. Baichuan 2: open large-scale language models. arXiv preprint 2023. https://arxiv.org/abs/2309.10305
* View Article
* Google Scholar
18. 18. Lou J, Lu Y, Dai D, Jia W, Lin H, Han X, et al. Universal information extraction as unified semantic matching. In: Proceedings of the AAAI conference on Artificial Intelligence. 2023. p. 13318-26.
19. 19. Riedel S, Yao L, McCallum A. Modeling relations and their mentions without labeled text. In: Balcázar JL, Bonchi F, Gionis A, Sebag M, editors. Machine learning and knowledge discovery in databases. Berlin, Heidelberg: Springer; 2010. p. 148–63.
20. 20. Tjong Kim Sang EF, De Meulder F. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003. 2003. p. 142–7. https://aclanthology.org/W03-0419
21. 21. Luan Y, He L, Ostendorf M, Hajishirzi H. Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. p. 3219–32. https://aclanthology.org/D18-1360
22. 22. Gui H, Qiao S, Zhang J, Ye H, Sun M, Liang L, et al. InstructIE: a bilingual instruction-based information extraction dataset. arXiv preprint 2024. https://arxiv.org/abs/2305.11527
* View Article
* Google Scholar
23. 23. Gui H, Yuan L, Ye H, Zhang N, Sun M, Liang L, et al. IEPile: unearthing large scale schema-conditioned information extraction corpus. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); 2024. p. 127–46.
* View Article
* Google Scholar
24. 24. Li Z, Zeng Y, Zuo Y, Ren W, Liu W, Su M. KnowCoder: coding structured knowledge into LLMs for universal information extraction. arXiv preprint 2024. https://arxiv.org/abs/2403.07969
* View Article
* Google Scholar
25. 25. Wang X, Zhou W, Zu C, Xia H, Chen T, Zhang Y. InstructUIE: multi-task instruction tuning for unified information extraction. arXiv preprint 2023. https://arxiv.org/abs/2304.08085
* View Article
* Google Scholar
26. 26. Pradhan SS, Xue N. OntoNotes: The 90% Solution. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts, 2009. p. 11–2. https://aclanthology.org/N09-4006
27. 27. Tedeschi S, Navigli R. MultiNERD: a multilingual, multi-genre and fine-grained dataset for named entity recognition (and disambiguation). In: Findings of the Association for Computational Linguistics: NAACL 2022. 2022. p. 801–12. https://aclanthology.org/2022.findings-naacl.60
28. 28. Liu J, Pasupat P, Cyphers S, Glass J. Asgard: a portable architecture for multilingual dialogue systems. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE; 2013. p. 8386–90.
29. 29. Chen P, Xu H, Zhang C, Huang R. Crossroads, buildings and neighborhoods: a dataset for fine-grained location recognition. In: Carpuat M, de Marneffe MC, Meza Ruiz IV, editors. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Seattle, United States: Association for Computational Linguistics; 2022. p. 3329–39. https://aclanthology.org/2022.naacl-main.243
30. 30. Kim JD, Ohta T, Tateisi Y, Tsujii J. Genia corpus—a semantically annotated corpus for bio-textmining. Bioinformatics. 2003;19(suppl_1):i180–2.
* View Article
* Google Scholar
31. 31. Kocaman V, Talby D. Biomedical named entity recognition at scale. In: Pattern recognition. 2021. p. 635–46.
32. 32. Zhang S, Cheng H, Gao J, Poon H. Optimizing bi-encoder for named entity recognition via contrastive learning. arXiv preprint 2022.
* View Article
* Google Scholar
33. 33. Pyysalo S, Ananiadou S. Anatomical entity mention recognition at literature scale. Bioinformatics. 2014;30(6):868–75. pmid:24162468
* View Article
* PubMed/NCBI
* Google Scholar
34. 34. Levow GA. The third international chinese language processing bakeoff: word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, 2006. p. 108–17. https://aclanthology.org/W06-0115
35. 35. Zhang Y, Yang J. Chinese NER using lattice LSTM. In: Gurevych I, Miyao Y, editors. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics; 2018. p. 1554–64. https://aclanthology.org/P18-1144
36. 36. Xu L, Tong Y, Dong Q, Liao Y, Yu C, Tian Y. CLUENER2020: fine-grained named entity recognition dataset and benchmark for Chinese. arXiv preprint 2020. https://arxiv.org/abs/2001.04351
* View Article
* Google Scholar
37. 37. Hendrickx I, Kim SN, Kozareva Z, Nakov P, Ó Séaghdha D, Padó S, et al. SemEval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: Erk K, Strapparava C, editors. Proceedings of the 5th International Workshop on Semantic Evaluation. Uppsala, Sweden: Association for Computational Linguistics; 2010. p. 33–8. https://aclanthology.org/S10-1006
38. 38. Carreras X, Màrquez L. Introduction to the CoNLL-2004 shared task: semantic role labeling. In: Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004, 2004. p. 89–97. https://aclanthology.org/W04-2412
39. 39. Takanobu R, Zhang T, Liu J, Huang M. A hierarchical framework for relation extraction with reinforcement learning In: Proceedings of the AAAI Conference on Artificial Intelligence; 2019. p. 7072–9.
40. 40. Zhang J, Liu X, Lai X, Gao Y, Wang S, Hu Y, et al. 2INER: instructive and in-context learning on few-shot named entity recognition. In: Bouamor H, Pino J, Bali K, editors. Findings of the Association for Computational Linguistics: EMNLP 2023. Singapore: Association for Computational Linguistics; 2023. p. 3940–51. https://aclanthology.org/2023.findings-emnlp.259
41. 41. Jat S, Khandelwal S, Talukdar P. Improving distantly supervised relation extraction using word and entity based attention. arXiv preprint 2018. https://arxiv.org/abs/1804.06987
* View Article
* Google Scholar
42. 42. Gurulingappa H, Mateen-Rajput A, Toldo L. Extraction of potential adverse drug events from medical case reports. J Biomed Semantics. 2012;3(1):15. pmid:23256479
* View Article
* PubMed/NCBI
* Google Scholar
43. 43. Guan T, Zan H, Zhou X, Xu H, Zhang K. CMeIE: construction and evaluation of chinese medical information extraction dataset. In: Zhu X, Zhang M, Hong Y, He R, editors. Natural language processing and chinese computing. Cham: Springer; 2020. p. 270–82.
44. 44. Sun Z, Li J, Pergola G, Wallace B, John B, Greene N, et al. PHEE: a dataset for pharmacovigilance event extraction from text. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022. p. 5571–87. https://aclanthology.org/2022.emnlp-main.376
45. 45. Satyapanich T, Ferraro F, Finin T. Casie: Extracting cybersecurity event information from text. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34; 2020. p. 8749–57.
46. 46. Li X, Li F, Pan L, Chen Y, Peng W, Wang Q, et al. DuEE: a large-scale dataset for chinese event extraction in real-world scenarios. In: Zhu X, Zhang M, Hong Y, He R, editors. Natural language processing and chinese computing. Cham: Springer; 2020. p. 534–45.
47. 47. Han C, Zhang J, Li X, Xu G, Peng W, Zeng Z. DuEE-Fin: a large-scale dataset for document-level event extraction. In: Lu W, Huang S, Hong Y, Zhou X, editors. Natural language processing and chinese computing. Cham: Springer; 2022. p. 172–83.
48. 48. Liu Z, Xu Y, Yu T, Dai W, Ji Z, Cahyawijaya S, et al. Crossner: Evaluating cross-domain named entity recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 35; 2021. p. 13452–60.
49. 49. Peng N, Dredze M. Named entity recognition for chinese social media with jointly trained embeddings. In: Màrquez L, Callison-Burch C, Su J, editors. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics; 2015. p. 548–54. https://aclanthology.org/D15-1064
50. 50. Li S, Ji H, Han J. Document-level event argument extraction by conditional generation. In: Toutanova K, Rumshisky A, Zettlemoyer L, Hakkani-Tur D, Beltagy I, Bethard S, et al., editors. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Online: Association for Computational Linguistics; 2021. p. 894–908. https://aclanthology.org/2021.naacl-main.69
51. 51. Ebner S, Xia P, Culkin R, Rawlins K, Van Durme B. Multi-sentence argument linking. arXiv preprint 2019.
* View Article
* Google Scholar
52. 52. Lee M, Soon LK, Siew EG, Sugianto LF. CrudeOilNews: an annotated crude oil news corpus for event extraction. In: Calzolari N, Béchet F, Blache P, Choukri K, Cieri C, Declerck T, et al., editors. Proceedings of the Thirteenth Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association; 2022. p. 465–79. https://aclanthology.org/2022.lrec-1.49
53. 53. Zhou Y, Chen Y, Zhao J, Wu Y, Xu J, Li J. What the role is vs. what plays the role: semi-supervised event argument extraction via dual question answering. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2021. p. 14638–46.
54. 54. Han X, Zhu H, Yu P, Wang Z, Yao Y, Liu Z, et al. FewRel: a large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. p. 4803–9. https://aclanthology.org/D18-1514
55. 55. Wang H, He Z, Ma J, Chen W, Zhang M. Ipre: a dataset for inter-personal relationship extraction. In: Natural Language Processing and Chinese Computing: 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9–14, 2019, Proceedings, Part II 8. Springer; 2019. p. 103–15.
56. 56. Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y. Llama 2: open foundation and fine-tuned chat models. arXiv preprint 2023. https://arxiv.org/abs/2307.09288
* View Article
* Google Scholar
57. 57. Bai J, Bai S, Chu Y, Cui Z, Dang K, Deng X. Qwen technical report. arXiv preprint 2023. https://arxiv.org/abs/2309.16609
* View Article
* Google Scholar
58. 58. Jiang AQ, Sablayrolles A, Mensch A, Bamford C, Chaplot DS, de las Casas D. Mistral 7B. arXiv preprint 2023. https://arxiv.org/abs/2310.06825
* View Article
* Google Scholar
Citation: Li W, Liu Y, Yang Y, Zhang T, Men W (2025) ChunkUIE: Chunked instruction-based unified information extraction. PLoS One 20(6): e0326764. https://doi.org/10.1371/journal.pone.0326764
About the Authors:
Wei Li
Contributed equally to this work with: Wei Li, Yingzhen Liu
Roles: Conceptualization, Formal analysis, Methodology, Project administration, Visualization, Writing – original draft
Affiliation: National Defense University, Beijing, China
Yingzhen Liu
Contributed equally to this work with: Wei Li, Yingzhen Liu
Roles: Conceptualization, Formal analysis, Methodology, Software, Validation, Writing – review & editing
Affiliation: State Key Laboratory of Geo-Information Engineering, Beijing, China
Yinling Yang
Roles: Data curation
Affiliation: Beijing Gengtu Technology Co., Ltd., Beijing, China
Ting Zhang
Roles: Investigation, Validation, Visualization
Affiliation: School of Transportation, Southeast University, Nanjing, China
Wei Men
Roles: Methodology, Supervision
E-mail: [email protected]
Affiliation: Beijing Gengtu Technology Co., Ltd., Beijing, China
ORICD: https://orcid.org/0009-0001-9602-8870
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
1. Vilar D, Freitag M, Cherry C, Luo J, Ratnakar V, Foster G. Prompting PaLM for translation: assessing strategies and performance. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023. p. 15406–27. https://aclanthology.org/2023.acl-long.859
2. Wu J, Gan W, Chen Z, Wan S, Philip SY. Multimodal large language models: a survey. In: 2023 IEEE International Conference on Big Data (BigData). 2023. p. 2247–56.
3. Tang H, Zhu D, Tang W, Wang S, Wang Y, Wang L. Research on joint model relation extraction method based on entity mapping. PLoS One. 2024;19(2):e0298974. pmid:38394238
4. Han D, Zheng Z, Zhao H, Feng S, Pang H. Span-based single-stage joint entity-relation extraction model. PLoS One. 2023;18(2):e0281055. pmid:36749758
5. Li B, Fang G, Yang Y, Wang Q, Ye W, Zhao W. Evaluating ChatGPT’s information extraction capabilities: an assessment of performance, explainability, calibration, and faithfulness. arXiv preprint 2023. https://arxiv.org/abs/2304.11633
6. Xu D, Chen W, Peng W, Zhang C, Xu T, Zhao X. Large language models for generative information extraction: a survey. arXiv preprint 2024. https://arxiv.org/abs/2312.17617
7. Wang S, Sun X, Li X, Ouyang R, Wu F, Zhang T. GPT-NER: named entity recognition via large language models. arXiv preprint 2023. https://arxiv.org/abs/2304.10428
8. Li P, Sun T, Tang Q, Yan H, Wu Y, Huang X, et al. CodeIE: Large code generation models are better few-shot information extractors. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. p. 15339–53. https://aclanthology.org/2023.acl-long.855
9. Lu Y, Liu Q, Dai D, Xiao X, Lin H, Han X. Unified structure generation for universal information extraction. arXiv preprint 2022.
10. Wang X, Zhou W, Zu C, Xia H, Chen T, Zhang Y. Instructuie: multi-task instruction tuning for unified information extraction. arXiv preprint 2023. https://arxiv.org/abs/2304.08085
11. Yang A, Yang B, Hui B, Zheng B, Yu B, Zhou C. Qwen2 technical report. arXiv preprint 2024. https://arxiv.org/abs/240710671
12. Dubey A, Jauhri A, Pandey A, Kadian A, Al-Dahle A, Letman A, et al. The Llama 3 herd of models. arXiv preprint 2024. https://arxiv.org/abs/2407.21783
13. Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S. LoRA: low-rank adaptation of large language models. 2022. https://openreview.net/forum?id=nZeVKeeFYf9
14. Chung HW, Hou L, Longpre S, Zoph B, Tay Y, Fedus W. Scaling instruction-finetuned language models. J Mach Learn Res. 2024;25(70):1–53.
15. Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P. Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst. 2022;35:27730–44.
16. Xiao X, Wang Y, Xu N, Wang Y, Yang H, Wang M. YAYI-UIE: a chat-enhanced instruction tuning framework for universal information extraction. arXiv preprint 2024. https://arxiv.org/abs/2312.15548
17. Yang A, Xiao B, Wang B, Zhang B, Bian C, Yin C, et al. Baichuan 2: open large-scale language models. arXiv preprint 2023. https://arxiv.org/abs/2309.10305
18. Lou J, Lu Y, Dai D, Jia W, Lin H, Han X, et al. Universal information extraction as unified semantic matching. In: Proceedings of the AAAI conference on Artificial Intelligence. 2023. p. 13318-26.
19. Riedel S, Yao L, McCallum A. Modeling relations and their mentions without labeled text. In: Balcázar JL, Bonchi F, Gionis A, Sebag M, editors. Machine learning and knowledge discovery in databases. Berlin, Heidelberg: Springer; 2010. p. 148–63.
20. Tjong Kim Sang EF, De Meulder F. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003. 2003. p. 142–7. https://aclanthology.org/W03-0419
21. Luan Y, He L, Ostendorf M, Hajishirzi H. Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. p. 3219–32. https://aclanthology.org/D18-1360
22. Gui H, Qiao S, Zhang J, Ye H, Sun M, Liang L, et al. InstructIE: a bilingual instruction-based information extraction dataset. arXiv preprint 2024. https://arxiv.org/abs/2305.11527
23. Gui H, Yuan L, Ye H, Zhang N, Sun M, Liang L, et al. IEPile: unearthing large scale schema-conditioned information extraction corpus. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); 2024. p. 127–46.
24. Li Z, Zeng Y, Zuo Y, Ren W, Liu W, Su M. KnowCoder: coding structured knowledge into LLMs for universal information extraction. arXiv preprint 2024. https://arxiv.org/abs/2403.07969
25. Wang X, Zhou W, Zu C, Xia H, Chen T, Zhang Y. InstructUIE: multi-task instruction tuning for unified information extraction. arXiv preprint 2023. https://arxiv.org/abs/2304.08085
26. Pradhan SS, Xue N. OntoNotes: The 90% Solution. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts, 2009. p. 11–2. https://aclanthology.org/N09-4006
27. Tedeschi S, Navigli R. MultiNERD: a multilingual, multi-genre and fine-grained dataset for named entity recognition (and disambiguation). In: Findings of the Association for Computational Linguistics: NAACL 2022. 2022. p. 801–12. https://aclanthology.org/2022.findings-naacl.60
28. Liu J, Pasupat P, Cyphers S, Glass J. Asgard: a portable architecture for multilingual dialogue systems. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE; 2013. p. 8386–90.
29. Chen P, Xu H, Zhang C, Huang R. Crossroads, buildings and neighborhoods: a dataset for fine-grained location recognition. In: Carpuat M, de Marneffe MC, Meza Ruiz IV, editors. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Seattle, United States: Association for Computational Linguistics; 2022. p. 3329–39. https://aclanthology.org/2022.naacl-main.243
30. Kim JD, Ohta T, Tateisi Y, Tsujii J. Genia corpus—a semantically annotated corpus for bio-textmining. Bioinformatics. 2003;19(suppl_1):i180–2.
31. Kocaman V, Talby D. Biomedical named entity recognition at scale. In: Pattern recognition. 2021. p. 635–46.
32. Zhang S, Cheng H, Gao J, Poon H. Optimizing bi-encoder for named entity recognition via contrastive learning. arXiv preprint 2022.
33. Pyysalo S, Ananiadou S. Anatomical entity mention recognition at literature scale. Bioinformatics. 2014;30(6):868–75. pmid:24162468
34. Levow GA. The third international chinese language processing bakeoff: word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, 2006. p. 108–17. https://aclanthology.org/W06-0115
35. Zhang Y, Yang J. Chinese NER using lattice LSTM. In: Gurevych I, Miyao Y, editors. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics; 2018. p. 1554–64. https://aclanthology.org/P18-1144
36. Xu L, Tong Y, Dong Q, Liao Y, Yu C, Tian Y. CLUENER2020: fine-grained named entity recognition dataset and benchmark for Chinese. arXiv preprint 2020. https://arxiv.org/abs/2001.04351
37. Hendrickx I, Kim SN, Kozareva Z, Nakov P, Ó Séaghdha D, Padó S, et al. SemEval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: Erk K, Strapparava C, editors. Proceedings of the 5th International Workshop on Semantic Evaluation. Uppsala, Sweden: Association for Computational Linguistics; 2010. p. 33–8. https://aclanthology.org/S10-1006
38. Carreras X, Màrquez L. Introduction to the CoNLL-2004 shared task: semantic role labeling. In: Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004, 2004. p. 89–97. https://aclanthology.org/W04-2412
39. Takanobu R, Zhang T, Liu J, Huang M. A hierarchical framework for relation extraction with reinforcement learning In: Proceedings of the AAAI Conference on Artificial Intelligence; 2019. p. 7072–9.
40. Zhang J, Liu X, Lai X, Gao Y, Wang S, Hu Y, et al. 2INER: instructive and in-context learning on few-shot named entity recognition. In: Bouamor H, Pino J, Bali K, editors. Findings of the Association for Computational Linguistics: EMNLP 2023. Singapore: Association for Computational Linguistics; 2023. p. 3940–51. https://aclanthology.org/2023.findings-emnlp.259
41. Jat S, Khandelwal S, Talukdar P. Improving distantly supervised relation extraction using word and entity based attention. arXiv preprint 2018. https://arxiv.org/abs/1804.06987
42. Gurulingappa H, Mateen-Rajput A, Toldo L. Extraction of potential adverse drug events from medical case reports. J Biomed Semantics. 2012;3(1):15. pmid:23256479
43. Guan T, Zan H, Zhou X, Xu H, Zhang K. CMeIE: construction and evaluation of chinese medical information extraction dataset. In: Zhu X, Zhang M, Hong Y, He R, editors. Natural language processing and chinese computing. Cham: Springer; 2020. p. 270–82.
44. Sun Z, Li J, Pergola G, Wallace B, John B, Greene N, et al. PHEE: a dataset for pharmacovigilance event extraction from text. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022. p. 5571–87. https://aclanthology.org/2022.emnlp-main.376
45. Satyapanich T, Ferraro F, Finin T. Casie: Extracting cybersecurity event information from text. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34; 2020. p. 8749–57.
46. Li X, Li F, Pan L, Chen Y, Peng W, Wang Q, et al. DuEE: a large-scale dataset for chinese event extraction in real-world scenarios. In: Zhu X, Zhang M, Hong Y, He R, editors. Natural language processing and chinese computing. Cham: Springer; 2020. p. 534–45.
47. Han C, Zhang J, Li X, Xu G, Peng W, Zeng Z. DuEE-Fin: a large-scale dataset for document-level event extraction. In: Lu W, Huang S, Hong Y, Zhou X, editors. Natural language processing and chinese computing. Cham: Springer; 2022. p. 172–83.
48. Liu Z, Xu Y, Yu T, Dai W, Ji Z, Cahyawijaya S, et al. Crossner: Evaluating cross-domain named entity recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 35; 2021. p. 13452–60.
49. Peng N, Dredze M. Named entity recognition for chinese social media with jointly trained embeddings. In: Màrquez L, Callison-Burch C, Su J, editors. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics; 2015. p. 548–54. https://aclanthology.org/D15-1064
50. Li S, Ji H, Han J. Document-level event argument extraction by conditional generation. In: Toutanova K, Rumshisky A, Zettlemoyer L, Hakkani-Tur D, Beltagy I, Bethard S, et al., editors. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Online: Association for Computational Linguistics; 2021. p. 894–908. https://aclanthology.org/2021.naacl-main.69
51. Ebner S, Xia P, Culkin R, Rawlins K, Van Durme B. Multi-sentence argument linking. arXiv preprint 2019.
52. Lee M, Soon LK, Siew EG, Sugianto LF. CrudeOilNews: an annotated crude oil news corpus for event extraction. In: Calzolari N, Béchet F, Blache P, Choukri K, Cieri C, Declerck T, et al., editors. Proceedings of the Thirteenth Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association; 2022. p. 465–79. https://aclanthology.org/2022.lrec-1.49
53. Zhou Y, Chen Y, Zhao J, Wu Y, Xu J, Li J. What the role is vs. what plays the role: semi-supervised event argument extraction via dual question answering. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2021. p. 14638–46.
54. Han X, Zhu H, Yu P, Wang Z, Yao Y, Liu Z, et al. FewRel: a large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. p. 4803–9. https://aclanthology.org/D18-1514
55. Wang H, He Z, Ma J, Chen W, Zhang M. Ipre: a dataset for inter-personal relationship extraction. In: Natural Language Processing and Chinese Computing: 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9–14, 2019, Proceedings, Part II 8. Springer; 2019. p. 103–15.
56. Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y. Llama 2: open foundation and fine-tuned chat models. arXiv preprint 2023. https://arxiv.org/abs/2307.09288
57. Bai J, Bai S, Chu Y, Cui Z, Dang K, Deng X. Qwen technical report. arXiv preprint 2023. https://arxiv.org/abs/2309.16609
58. Jiang AQ, Sablayrolles A, Mensch A, Bamford C, Chaplot DS, de las Casas D. Mistral 7B. arXiv preprint 2023. https://arxiv.org/abs/2310.06825
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Large language models (LLMs) have demonstrated remarkable performance across various linguistic tasks. However, existing LLMs perform inadequately in information extraction tasks for both Chinese and English. Numerous studies attempt to enhance model performance by increasing the scale of training data. However, discrepancies in the number and type of schemas used during training and evaluation can harm model effectiveness. To tackle this challenge, we propose ChunkUIE, a unified information extraction model that supports Chinese and English. We design a chunked instruction construction strategy that randomly and reproducibly divides all schemas into chunks containing an identical number of schemas. This approach ensures that the union of schemas across all chunks encompasses all schemas. By limiting the number of schemas in each instruction, this strategy effectively addresses the performance degradation caused by inconsistencies in schema counts between training and evaluation. Additionally, we construct some challenging negative schemas using a predefined hard schema dictionary, which mitigates the model’s semantic confusion regarding similar schemas. Experimental results demonstrate that ChunkUIE enhances zero-shot performance in information extraction.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer