Content area
Background
Integrating large language models (LLMs) with virtual patient platforms offers a novel approach to teaching clinical reasoning. This study evaluated the performance and educational value of combining Body Interact with two AI models, ChatGPT-4 and DeepSeek-R1, across acute care scenarios.
Methods
Three standardized cases (coma, stroke, trauma) were simulated by two medical researchers. Structured case summaries were input into both models using identical prompts. Outputs were assessed for diagnostic and treatment consistency, alignment with clinical reasoning stages, and educational quality using expert scoring, AI self-assessment, text readability indices, and Grammarly analysis.
Results
ChatGPT-4 performed best in stroke scenarios but was less consistent in coma and trauma cases. DeepSeek-R1 showed more stable diagnostic and therapeutic output across all cases. While both models received high expert and self-assessment scores, ChatGPT-4 produced more readable outputs, and DeepSeek-R1 demonstrated greater grammatical precision.
Conclusions
Our findings suggest that ChatGPT-4 and DeepSeek-R1 each offer unique strengths for AI-assisted instruction. ChatGPT-4’s accessible language may better support early learners, whereas DeepSeek-R1 may be more aligned with formal clinical reasoning. Selecting models based on specific teaching goals can enhance the effectiveness of AI-driven medical education.
Details
Educational Quality;
Physicians;
Patients;
Neurology;
Relevance (Education);
Educational Research;
Diagnostic Tests;
Medical Education;
Measurement Techniques;
Individualized Instruction;
Program Evaluation;
Decision Making;
Medical Evaluation;
Diagnostic Teaching;
Bilingualism;
Accuracy;
Artificial Intelligence;
Evaluative Thinking;
Educational Assessment;
Comparative Education;
Comparative Analysis;
Physical Examinations;
Interdisciplinary Approach;
Learner Engagement