Abstract

Identifying cohorts of patients based on eligibility criteria such as medical conditions, procedures, and medication use is critical to recruitment for clinical trials. Such criteria are often most naturally described in free-text, using language familiar to clinicians and researchers. In order to identify potential participants at scale, these criteria must first be translated into queries on clinical databases, which can be labor-intensive and error-prone. Natural language processing (NLP) methods offer a potential means of such conversion into database queries automatically. However they must first be trained and evaluated using corpora which capture clinical trials criteria in sufficient detail. In this paper, we introduce the Leaf Clinical Trials (LCT) corpus, a human-annotated corpus of over 1,000 clinical trial eligibility criteria descriptions using highly granular structured labels capturing a range of biomedical phenomena. We provide details of our schema, annotation process, corpus quality, and statistics. Additionally, we present baseline information extraction results on this corpus as benchmarks for future work.

Measurement(s)

Clinical Trial Eligibility Criteria

Technology Type(s)

natural language processing

Sample Characteristic - Organism

Homo sapiens

Details

Title
The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria
Author
Dobbins, Nicholas J. 1   VIAFID ORCID Logo  ; Mullen, Tony 2   VIAFID ORCID Logo  ; Uzuner, Özlem 3 ; Yetisgen, Meliha 1 

 University of Washington, Department of Biomedical Informatics & Medical Education, Seattle, USA (GRID:grid.34477.33) (ISNI:0000000122986657) 
 Northeastern University, Khoury College of Computer Science, Seattle, USA (GRID:grid.261112.7) (ISNI:0000 0001 2173 3359) 
 George Mason University, Department of Information Sciences and Technology, Fairfax, USA (GRID:grid.22448.38) (ISNI:0000 0004 1936 8032) 
Publication year
2022
Publication date
2022
Publisher
Nature Publishing Group
e-ISSN
20524463
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2700915942
Copyright
© The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.