Full text

Turn on search term navigation

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Background: In recent years, numerous artificial intelligence applications, especially generative large language models, have evolved in the medical field. This study conducted a structured comparative analysis of four leading generative large language models (LLMs)—ChatGPT-4o (OpenAI), Grok-3 (xAI), Gemini-2.0 Flash (Google), and DeepSeek-V3 (DeepSeek)—to evaluate their diagnostic performance in clinical case scenarios. Methods: We assessed medical knowledge recall and clinical reasoning capabilities through staged, progressively complex cases, with responses graded by expert raters using a 0–5 scale. Results: All models performed better on knowledge-based questions than on reasoning tasks, highlighting the ongoing limitations in contextual diagnostic synthesis. Overall, DeepSeek outperformed the other models, achieving significantly higher scores across all evaluation dimensions (p < 0.05), particularly in regards to medical reasoning tasks. Conclusions: While these findings support the feasibility of using LLMs for medical training and decision support, the study emphasizes the need for improved interpretability, prompt optimization, and rigorous benchmarking to ensure clinical reliability. This structured, comparative approach contributes to ongoing efforts to establish standardized evaluation frameworks for integrating LLMs into diagnostic workflows.

Details

Title
Assessing the Accuracy of Diagnostic Capabilities of Large Language Models
Author
Urda-Cîmpean Andrada Elena 1   VIAFID ORCID Logo  ; Daniel-Corneliu, Leucuța 1   VIAFID ORCID Logo  ; Drugan, Cristina 2   VIAFID ORCID Logo  ; Alina-Gabriela, Duțu 2 ; Tudor, Călinici 1   VIAFID ORCID Logo  ; Drugan Tudor 1   VIAFID ORCID Logo 

 Department of Medical Informatics and Biostatistics, Iuliu Hațieganu University of Medicine and Pharmacy, 400349 Cluj-Napoca, Romania 
 Department of Medical Biochemistry, Iuliu Hațieganu University of Medicine and Pharmacy, 400349 Cluj-Napoca, Romania 
First page
1657
Publication year
2025
Publication date
2025
Publisher
MDPI AG
e-ISSN
20754418
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3229142438
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.