Abstract

Named entity recognition is a fundamental subtask for knowledge graph construction and question-answering in the agricultural diseases and pests field. Although several works have been done, the scarcity of the Chinese annotated dataset has restricted the development of agricultural diseases and pests named entity recognition(ADP-NER). To address the issues, a large-scale corpus for the Chinese ADP-NER task named AgCNER was first annotated. It mainly contains 13 categories, 206,992 entities, and 66,553 samples with 3,909,293 characters. Compared with other datasets, AgCNER maintains the best performance in terms of the number of categories, entities, samples, and characters. Moreover, this is the first publicly available corpus for the agricultural field. In addition, the agricultural language model AgBERT is also fine-tuned and released. Finally, the comprehensive experimental results showed that BiLSTM-CRF achieved F1-score of 93.58%, which would be further improved to 94.14% using BERT. The analysis from multiple aspects has verified the rationality of AgCNER and the effectiveness of AgBERT. The annotated corpus and fine-tuned language model are publicly available at https://doi.org/XXX and https://github.com/guojson/AgCNER.git.

Details

Title
AgCNER, the First Large-Scale Chinese Named Entity Recognition Dataset for Agricultural Diseases and Pests
Author
Yao, Xiaochuang 1   VIAFID ORCID Logo  ; Hao, Xia 2 ; Liu, Ruilin 2 ; Li, Lin 3 ; Guo, Xuchao 2   VIAFID ORCID Logo 

 China Agricultural University, College of Land Science and Technology, Beijing, China (GRID:grid.22935.3f) (ISNI:0000 0004 0530 8290) 
 Shandong Agricultural University, College of Information Science and Engineering, Tai’an, China (GRID:grid.440622.6) (ISNI:0000 0000 9482 4676) 
 China Agricultural University, College of Information and Electrical Engineering, Beijing, China (GRID:grid.22935.3f) (ISNI:0000 0004 0530 8290) 
Pages
769
Publication year
2024
Publication date
2024
Publisher
Nature Publishing Group
e-ISSN
20524463
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3079612889
Copyright
© The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.