Full Text

Turn on search term navigation

© 2025 Wu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Background

Chat Generative Pre-Trained Transformer (ChatGPT), launched by OpenAI in November 2022, features advanced large language models optimized for dialog. However, the performance differences between GPT-3.5, GPT-4, and GPT-4o in medical contexts remain unclear.

Objective

This study evaluates the accuracy of GPT-3.5, GPT-4, and GPT-4o across various medical subjects. GPT-4o’s performances in Chinese and English were also analyzed.

Methods

We retrospectively compared GPT-3.5, GPT-4, and GPT-4o in Stage 1 of the Taiwanese Senior Professional and Technical Examinations for Medical Doctors (SPTEMD) from July 2021 to February 2024, excluding image-based questions.

Results

The overall accuracy rates of GPT-3.5, GPT-4, and GPT-4o were 65.74% (781/1188), 95.71% (1137/1188), and 96.72% (1149/1188), respectively. GPT-4 and GPT-4o outperformed GPT-3.5 across all subjects. Statistical analysis revealed a significant difference between GPT-3.5 and the other models (p < 0.05) but no significant difference between GPT-4 and GPT-4o. Among subjects, physiology had a significantly higher error rate (p < 0.05) than the overall average across all three models. GPT-4o’s accuracy rates in Chinese (98.14%) and English (98.48%) did not differ significantly.

Conclusions

GPT-4 and GPT-4o exceed the accuracy threshold for Taiwanese SPTEMD, demonstrating advancements in contextual comprehension and reasoning. Future research should focus on responsible integration into medical training and assessment.

Details

Title
Advancing medical AI: GPT-4 and GPT-4o surpass GPT-3.5 in Taiwanese medical licensing exams
Author
Yao-Cheng, Wu; Yun-Chi, Wu; Ya-Chuan Chang; Chia-Ying, Yu; Chun-Lin, Wu; Wen-Wei, Sung  VIAFID ORCID Logo 
First page
e0324841
Section
Research Article
Publication year
2025
Publication date
Jun 2025
Publisher
Public Library of Science
e-ISSN
19326203
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3215943326
Copyright
© 2025 Wu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.