Full Text

Turn on search term navigation

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

This study presents an integrated approach for automatically extracting and structuring information from medical reports, captured as scanned documents or photographs, through a combination of image recognition and natural language processing (NLP) techniques like named entity recognition (NER). The primary aim was to develop an adaptive model for efficient text extraction from medical report images. This involved utilizing a genetic algorithm (GA) to fine-tune optical character recognition (OCR) hyperparameters, ensuring maximal text extraction length, followed by NER processing to categorize the extracted information into required entities, adjusting parameters if entities were not correctly extracted based on manual annotations. Despite the diverse formats of medical report images in the dataset, all in Russian, this serves as a conceptual example of information extraction (IE) that can be easily extended to other languages.

Details

Title
Image Text Extraction and Natural Language Processing of Unstructured Data from Medical Reports
Author
Malashin, Ivan 1   VIAFID ORCID Logo  ; Masich, Igor 1   VIAFID ORCID Logo  ; Tynchenko, Vadim 1   VIAFID ORCID Logo  ; Gantimurov, Andrei 1 ; Nelyub, Vladimir 2   VIAFID ORCID Logo  ; Borodulin, Aleksei 1   VIAFID ORCID Logo 

 Artificial Intelligence Technology Scientific and Education Center, Bauman Moscow State Technical University, 105005 Moscow, Russia; [email protected] (I.M.); 
 Artificial Intelligence Technology Scientific and Education Center, Bauman Moscow State Technical University, 105005 Moscow, Russia; [email protected] (I.M.); ; Scientific Department, Far Eastern Federal University, 690922 Vladivostok, Russia 
First page
1361
Publication year
2024
Publication date
2024
Publisher
MDPI AG
e-ISSN
25044990
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3072381021
Copyright
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.