Abstract

The unstructured nature of Real-World (RW) data from onco-hematological patients and the scarce accessibility to integrated systems restrain the use of RW information for research purposes. Natural Language Processing (NLP) might help in transposing unstructured reports into standardized electronic health records. We exploited NLP to develop an automated tool, named ARGO (Automatic Record Generator for Onco-hematology) to recognize information from pathology reports and populate electronic case report forms (eCRFs) pre-implemented by REDCap. ARGO was applied to hemo-lymphopathology reports of diffuse large B-cell, follicular, and mantle cell lymphomas, and assessed for accuracy (A), precision (P), recall (R) and F1-score (F) on internal (n = 239) and external (n = 93) report series. 326 (98.2%) reports were converted into corresponding eCRFs. Overall, ARGO showed high performance in capturing (1) identification report number (all metrics > 90%), (2) biopsy date (all metrics > 90% in both series), (3) specimen type (86.6% and 91.4% of A, 98.5% and 100.0% of P, 92.5% and 95.5% of F, and 87.2% and 91.4% of R for internal and external series, respectively), (4) diagnosis (100% of P with A, R and F of 90% in both series). We developed and validated a generalizable tool that generates structured eCRFs from real-life pathology reports.

Details

Title
Electronic case report forms generation from pathology reports by ARGO, automatic record generator for onco-hematology
Author
Zaccaria, Gian Maria 1 ; Colella, Vito 2 ; Colucci Simona 2 ; Clemente Felice 3 ; Pavone Fabio 3 ; Vegliante Maria Carmela 3 ; Esposito, Flavia 4 ; Opinto Giuseppina 5 ; Scattone, Anna 6 ; Loseto Giacomo 5 ; Minoia, Carla 5 ; Rossini Bernardo 5 ; Quinto, Angela Maria 5 ; Angiulli Vito 7 ; Grieco, Luigi Alfredo 2 ; Fama Angelo 8 ; Ferrero, Simone 9 ; Moia Riccardo 10 ; Di, Rocco Alice 11 ; Quaglia, Francesca Maria 12 ; Tabanelli Valentina 13 ; Guarini Attilio 14 ; Ciavarella Sabino 14 

 IRCCS Istituto Tumori ‘Giovanni Paolo II’, Hematology and Cell Therapy Unit, Bari, Italy 
 Politecnico of Bari, Department of Electrical and Information Engineering, Bari, Italy (GRID:grid.4466.0) (ISNI:0000 0001 0578 5482) 
 IRCCS Istituto Tumori ‘Giovanni Paolo II’, Hematology and Cell Therapy Unit, Bari, Italy (GRID:grid.4466.0) 
 IRCCS Istituto Tumori ‘Giovanni Paolo II’, Hematology and Cell Therapy Unit, Bari, Italy (GRID:grid.4466.0); University of Bari Aldo Moro, Department of Mathematics, Bari, Italy (GRID:grid.7644.1) (ISNI:0000 0001 0120 3326) 
 IRCCS Istituto Tumori ‘Giovanni Paolo II’, Hematology and Cell Therapy Unit, Bari, Italy (GRID:grid.7644.1) 
 IRCCS Istituto Tumori ‘Giovanni Paolo II’, Pathology Department, Bari, Italy (GRID:grid.7644.1) 
 IRCCS Istituto Tumori ‘Giovanni Paolo II’, Clinical Engineering Unit, Bari, Italy (GRID:grid.7644.1) 
 Hematology, Azienda USL - IRCCS Di Reggio Emilia, Reggio Emilia, Italy (GRID:grid.4466.0) 
 AOU “Città Della Salute e Della Scienza di Torino”, Division of Hematology 1, Torino, Italy (GRID:grid.4466.0); University of Torino, Department of Molecular Biotechnologies and Health Sciences, Torino, Italy (GRID:grid.7605.4) (ISNI:0000 0001 2336 6580) 
10  Azienda Ospedaliero-Universitaria Maggiore Della Carità Di Novara, Division of Hematology, Novara, Italy (GRID:grid.412824.9) (ISNI:0000 0004 1756 8161) 
11  Azienda Ospedaliero-Universitaria Policlinico Umberto I, Unit of Hematology, Roma, Italy (GRID:grid.417007.5) 
12  University of Verona, Department of Medicine, Section of Hematology, Verona, Italy (GRID:grid.5611.3) (ISNI:0000 0004 1763 1124) 
13  European Institute of Oncology, IRCCS, Division of Diagnostic Haematopathology, Milano, Italy (GRID:grid.15667.33) (ISNI:0000 0004 1757 0843) 
14  IRCCS Istituto Tumori ‘Giovanni Paolo II’, Hematology and Cell Therapy Unit, Bari, Italy (GRID:grid.15667.33) 
Publication year
2021
Publication date
2021
Publisher
Nature Publishing Group
e-ISSN
20452322
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2608625873
Copyright
© The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.