Full text

Turn on search term navigation

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Biomedical Named-Entity Recognition (BioNER) has become an essential part of text mining due to the continuously increasing digital archives of biological and medical articles. While there are many well-performing BioNER tools for entities such as genes, proteins, diseases or species, there is very little research into food and dietary constituent named-entity recognition. For this reason, in this paper, we study seven BioNER models for food and dietary constituents recognition. Specifically, we study a dictionary-based model, a conditional random fields (CRF) model and a new hybrid model, called FooDCoNER (Food and Dietary Constituents Named-Entity Recognition), which we introduce combining the former two models. In addition, we study deep language models including BERT, BioBERT, RoBERTa and ELECTRA. As a result, we find that FooDCoNER does not only lead to the overall best results, comparable with the deep language models, but FooDCoNER is also much more efficient with respect to run time and sample size requirements of the training data. The latter has been identified via the study of learning curves. Overall, our results not only provide a new tool for food and dietary constituent NER but also shed light on the difference between classical machine learning models and recent deep language models.

Details

Title
Comparison of Text Mining Models for Food and Dietary Constituent Named-Entity Recognition
Author
Perera, Nadeesha 1   VIAFID ORCID Logo  ; Thi Thuy Linh Nguyen 1 ; Dehmer, Matthias 2   VIAFID ORCID Logo  ; Emmert-Streib, Frank 1   VIAFID ORCID Logo 

 Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, 33100 Tampere, Finland; [email protected] (N.P.); [email protected] (T.T.L.N.) 
 Department of Computer Science, Swiss Distance University of Applied Sciences, 3900 Brig, Switzerland; [email protected]; Department of Mechatronics and Biomedical Computer Science, University for Health Sciences, Medical Informatics and Technology (UMIT), 6060 Hall, Austria; College of Artificial Intelligence, Nankai University, Tianjin 300350, China 
First page
254
Publication year
2022
Publication date
2022
Publisher
MDPI AG
e-ISSN
25044990
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2642487876
Copyright
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.