Content area

Abstract

The increasing availability of digital health records allows us to gain knowledge about the determinants of cancer outcomes in an unprecedented manner. Nevertheless, methods to efficiently and accurately identify adverse events in large retrospective studies are limited. Natural language processing (NLP) is a potential solution to extract valuable information from patient data, typically stored in unstructured text data and isolated datasets in hospital systems. Given the success of large language models (LLMs) in natural language comprehension tasks, we explored the generation of a large labeled clinical notes dataset using a generative LLM with prompt engineering and the efficacy in using this labeled dataset for fine-tuning encoder-based LLMs. We deduced that the generative LLM LLaMA 70B produces accurate note level predictions of adverse event (AE) occurrence by comparing LLaMA 70B’s predictions against a small sample of clinical notes annotated by an oncologist. Thus, we used LLaMA 70B to annotate a dataset of 7,345 patients (412,530 clinical notes) from the MSK-IMPACT dataset. The performance of this annotated dataset in fine-tuning ModernBERT and Clinical Longformer to predict AE occurrence was compared to the performance of fine-tuning these models using another version of the dataset annotated using clinical trial data. In this study, precision and recall scores of 0.80 are considered acceptable as they reflect an optimal balance between accurate predictions and sufficient sensitivity. Our results prove that the dataset labeled by LLaMA 70B performs better than the labels produced using clinical trial data. To evaluate the performance of the LLaMA 70B generated labels, we compared the LLaMA 70B predictions to our clinical trial data. On our training set of 5,875 patients, LLaMA 70B achieved a macro-averaged recall of 0.90, accuracy of 0.71, precision of 0.07, F1-score of 0.13, and specificity of 0.70. The evaluation metrics were similar for the test set. We find that LLaMA 70B note level predictions serve as better labels than our clinical trial note level labels as both ModernBERT and Clinical Longformer performed better when trained and tested on LLaMA 70B labels. While smoothing the LLaMA 70B predictions and more prompt engineering were tried to improve the performance of patient level predictions against ground truth patient level clinical trial labels, these methods did not lead to remarkable results. We find that manual inspection of LLaMA’s note level predictions by a medical expert is the best method to validate them. The most effective approach to create a clinical notes dataset with high quality labels is to have medical experts manually annotate the notes.

Details

1010268
Title
Adverse Event Prediction Using Natural Language Processing (NLP)
Number of pages
151
Publication year
2025
Degree date
2025
School code
0057
Source
MAI 86/11(E), Masters Abstracts International
ISBN
9798314872345
Advisor
Committee member
Shoop, Barry
University/institution
The Cooper Union for the Advancement of Science and Art
Department
Electrical Engineering
University location
United States -- New York
Degree
M.E.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32003387
ProQuest document ID
3201915407
Document URL
https://www.proquest.com/dissertations-theses/adverse-event-prediction-using-natural-language/docview/3201915407/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic