Content area

Abstract

Background:Accurately assessing perioperative risk is critical for informed surgical planning and patient safety. However, current prediction models often rely on structured data and overlook the nuanced clinical reasoning embedded in free-text preoperative notes. Recent advances in large language models (LLMs) have opened opportunities for harnessing unstructured clinical data, yet their application in perioperative prediction remains limited by concerns about factual accuracy. Retrieval-augmented generation (RAG) offers a promising solution—enhancing LLM performance by grounding outputs in domain-specific knowledge sources, potentially improving both predictive accuracy and clinical interpretability.

Objective:This study aimed to investigate whether integrating LLMs with RAG can improve the prediction of 30-day postoperative mortality and American Society of Anesthesiologists (ASA) physical status classification using unstructured preoperative clinical notes.

Methods:We conducted a retrospective cohort study using 24,491 medical records from a tertiary medical center, including preoperative anesthesia assessments, discharge summaries, and surgical information. To extract clinical insights from free-text data, we used the LLaMA 3.1-8B language model with RAG, using MedEmbed for text embedding and Miller’s Anesthesia as the primary retrieval source. We evaluated model performance under various configurations, including embedding models, chunk sizes, and few-shot prompting. Machine learning (ML) models, including random forest, support vector machines (SVM), Extreme Gradient Boosting (XGBoost), and logistic regression, were trained on structured features as baselines.

Results:A total of 520 (2.1%) patients experienced in-hospital 30-day postoperative mortality. The ASA physical status distribution was as follows: class I: 535 (2.2%); class II: 15,272 (62.4%); class III: 8024 (32.8%); class IV: 606 (2.5%); and class V: 54 (0.22%). For 30-day postoperative mortality prediction, the LLaMA‑RAG model achieved an F1-score of 0.4663 (95% CI 0.4654-0.4672), versus 0.2369 (95% CI 0.2341-0.2397) without few‑shot prompting, 0.0879 (95% CI 0.0717-0.1041) without RAG, and 0.0436 (95% CI 0.0292-0.0580) without either few‑shot prompting or RAG. Among ML models, XGBoost scored 0.4459 (95% CI 0.4176-0.4742); random forest, 0.3953 (95% CI 0.3791-0.4115); logistic regression, 0.2720 (95% CI 0.2647-0.2793); and SVM, 0.2474 (95% CI 0.2275-0.2673). For ASA classification, LLaMA‑RAG achieved a micro F1-score of 0.8409 (95% CI 0.8238-0.8551) versus 0.6546 (95% CI 0.6430-0.6796) without few-shot prompting, 0.6340 (95% CI 0.6157-0.6535) without RAG, and 0.4238 (95% CI 0.3952-0.4490) without either few‑shot prompting or RAG. In comparison, XGBoost achieved 0.8273 (95% CI 0.8209-0.8498); logistic regression, 0.7940 (95% CI 0.7671-0.7950); random forest, 0.7847 (95% CI 0.7637-0.7868); and SVM, 0.7697 (95% CI 0.7637-0.7697). Notably, the model demonstrated exceptional sensitivity in identifying rare but high-risk cases, such as ASA Class 5 patients and postoperative deaths.

Conclusions:The LLaMA-RAG model significantly improved the prediction of postoperative mortality and ASA classification, especially for rare high-risk cases. By grounding outputs in domain knowledge, retrieval-augmented generation enhanced both accuracy and prompt‑driven interpretability over ML and ablation models—highlighting its promise for real-world clinical decision support.

Details

1009240
Business indexing term
Company / organization
Title
Predicting 30-Day Postoperative Mortality and American Society of Anesthesiologists Physical Status Using Retrieval-Augmented Large Language Models: Development and Validation Study
Author
Publication title
Volume
27
First page
e75052
Publication year
2025
Publication date
2025
Section
Clinical Information and Decision Making
Publisher
Gunther Eysenbach MD MPH, Associate Professor
Place of publication
Toronto
Country of publication
Canada
e-ISSN
1438-8871
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-06-03
Milestone dates
2025-03-27 (Preprint first published); 2025-03-27 (Submitted); 2025-05-08 (Revised version received); 2025-05-12 (Accepted); 2025-06-03 (Published)
Publication history
 
 
   First posting date
03 Jun 2025
ProQuest document ID
3222369277
Document URL
https://www.proquest.com/scholarly-journals/predicting-30-day-postoperative-mortality/docview/3222369277/se-2?accountid=208611
Copyright
© 2025. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-11-07
Database
2 databases
  • Coronavirus Research Database
  • ProQuest One Academic