Full text

Turn on search term navigation

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Systems supporting systematic literature reviews often use machine learning algorithms to create classification models to assess the relevance of articles to study topics. The proper choice of text representation for such algorithms may have a significant impact on their predictive performance. This article presents an in-depth investigation of the utility of the bag of concepts representation for this purpose, which can be considered an enhanced form of the ubiquitous bag of words representation, with features corresponding to ontology concepts rather than words. Its utility is evaluated in the active learning setting, in which a sequence of classification models is created, with training data iteratively expanded by adding articles selected for human screening. Different versions of the bag of concepts are compared with bag of words, as well as with combined representations, including both word-based and concept-based features. The evaluation uses the support vector machine, naive Bayes, and random forest algorithms and is performed on datasets from 15 systematic medical literature review studies. The results show that concept-based features may have additional predictive value in comparison to standard word-based features and that the combined bag of concepts and bag of words representation is the most useful overall.

Details

Title
Active Learning for Medical Article Classification with Bag of Words and Bag of Concepts Embeddings
Author
Pytlak Radosław 1   VIAFID ORCID Logo  ; Cichosz Paweł 2   VIAFID ORCID Logo  ; Fajdek Bartłomiej 3   VIAFID ORCID Logo  ; Jastrzębski Bogdan 1 

 Faculty of Mathematics and Information Sciences, Warsaw University of Technology, 00-665 Warsaw, Poland; [email protected] 
 Faculty of Electronics and Information Technology, Warsaw University of Technology, 00-665 Warsaw, Poland; [email protected] 
 Faculty of Mechatronics, Warsaw University of Technology, 00-665 Warsaw, Poland; [email protected] 
First page
7955
Publication year
2025
Publication date
2025
Publisher
MDPI AG
e-ISSN
20763417
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3233064364
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.