Abstract

Much of the knowledge and information needed for enabling high-quality clinical research is stored in free-text format. Natural language processing (NLP) has been used to extract information from these sources at scale for several decades. This paper aims to present a comprehensive review of clinical NLP for the past 15 years in the UK to identify the community, depict its evolution, analyse methodologies and applications, and identify the main barriers. We collect a dataset of clinical NLP projects (n = 94; £ = 41.97 m) funded by UK funders or the European Union’s funding programmes. Additionally, we extract details on 9 funders, 137 organisations, 139 persons and 431 research papers. Networks are created from timestamped data interlinking all entities, and network analysis is subsequently applied to generate insights. 431 publications are identified as part of a literature review, of which 107 are eligible for final analysis. Results show, not surprisingly, clinical NLP in the UK has increased substantially in the last 15 years: the total budget in the period of 2019–2022 was 80 times that of 2007–2010. However, the effort is required to deepen areas such as disease (sub-)phenotyping and broaden application domains. There is also a need to improve links between academia and industry and enable deployments in real-world settings for the realisation of clinical NLP’s great potential in care delivery. The major barriers include research and development access to hospital data, lack of capable computational resources in the right places, the scarcity of labelled data and barriers to sharing of pretrained models.

Details

Title
A survey on clinical natural language processing in the United Kingdom from 2007 to 2022
Author
Wu, Honghan 1   VIAFID ORCID Logo  ; Wang, Minhong 1 ; Wu, Jinge 2 ; Francis, Farah 3   VIAFID ORCID Logo  ; Chang, Yun-Hsuan 1 ; Shavick, Alex 4 ; Dong, Hang 5 ; Poon, Michael T. C. 3 ; Fitzpatrick, Natalie 1 ; Levine, Adam P. 4   VIAFID ORCID Logo  ; Slater, Luke T. 6 ; Handy, Alex 7 ; Karwath, Andreas 6 ; Gkoutos, Georgios V. 6   VIAFID ORCID Logo  ; Chelala, Claude 8 ; Shah, Anoop Dinesh 1   VIAFID ORCID Logo  ; Stewart, Robert 9   VIAFID ORCID Logo  ; Collier, Nigel 10 ; Alex, Beatrice 11 ; Whiteley, William 3   VIAFID ORCID Logo  ; Sudlow, Cathie 3 ; Roberts, Angus 12   VIAFID ORCID Logo  ; Dobson, Richard J. B. 13   VIAFID ORCID Logo 

 University College London, Institute of Health Informatics, London, UK (GRID:grid.83440.3b) (ISNI:0000000121901201) 
 University College London, Institute of Health Informatics, London, UK (GRID:grid.83440.3b) (ISNI:0000000121901201); University of Edinburgh, Usher Institute, Edinburgh, UK (GRID:grid.4305.2) (ISNI:0000 0004 1936 7988) 
 University of Edinburgh, Usher Institute, Edinburgh, UK (GRID:grid.4305.2) (ISNI:0000 0004 1936 7988) 
 University College London, Research Department of Pathology, UCL Cancer Institute, London, UK (GRID:grid.83440.3b) (ISNI:0000000121901201) 
 University of Edinburgh, Usher Institute, Edinburgh, UK (GRID:grid.4305.2) (ISNI:0000 0004 1936 7988); University of Oxford, Department of Computer Science, Oxford, UK (GRID:grid.4991.5) (ISNI:0000 0004 1936 8948) 
 University of Birmingham, Institute of Cancer and Genomics, Birmingham, UK (GRID:grid.6572.6) (ISNI:0000 0004 1936 7486) 
 University College London, Institute of Health Informatics, London, UK (GRID:grid.83440.3b) (ISNI:0000000121901201); University College London Hospitals NHS Trust, London, UK (GRID:grid.52996.31) (ISNI:0000 0000 8937 2257) 
 Queen Mary University of London, Centre for Tumour Biology, Barts Cancer Institute, London, UK (GRID:grid.4868.2) (ISNI:0000 0001 2171 1133) 
 King’s College London, Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), London, UK (GRID:grid.13097.3c) (ISNI:0000 0001 2322 6764); South London and Maudsley NHS Foundation Trust, London, UK (GRID:grid.37640.36) (ISNI:0000 0000 9439 0839) 
10  University of Cambridge, Theoretical and Applied Linguistics, Faculty of Modern & Medieval Languages & Linguistics, Cambridge, UK (GRID:grid.5335.0) (ISNI:0000000121885934) 
11  University of Edinburgh, Edinburgh Futures Institute, Edinburgh, UK (GRID:grid.4305.2) (ISNI:0000 0004 1936 7988) 
12  King’s College London, Department of Biostatistics & Health Informatics, London, UK (GRID:grid.13097.3c) (ISNI:0000 0001 2322 6764) 
13  University College London, Institute of Health Informatics, London, UK (GRID:grid.83440.3b) (ISNI:0000000121901201); King’s College London, Department of Biostatistics & Health Informatics, London, UK (GRID:grid.13097.3c) (ISNI:0000 0001 2322 6764) 
Pages
186
Publication year
2022
Publication date
Dec 2022
Publisher
Nature Publishing Group
e-ISSN
23986352
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2756517071
Copyright
© The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.