Full text

Turn on search term navigation

© The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Objective

The United States Medical Licensing Examination (USMLE) assesses physicians' competency. Passing this exam is required to practice medicine in the U.S. With the emergence of large language models (LLMs) like ChatGPT and GPT-4, understanding their performance on these exams illuminates their potential in medical education and healthcare.

Materials and methods

A PubMed literature search following the 2020 PRISMA guidelines was conducted, focusing on studies using official USMLE questions and GPT models.

Results

Six relevant studies were found out of 19 screened, with GPT-4 showcasing the highest accuracy rates of 80–100% on the USMLE. Open-ended prompts typically outperformed multiple-choice ones, with 5-shot prompting slightly edging out zero-shot.

Conclusion

LLMs, especially GPT-4, display proficiency in tackling USMLE questions. As AI integrates further into healthcare, ongoing assessments against trusted benchmarks are essential.

Article Highlights

GPT-4 showed accuracy rates of 80–90% on the USMLE, outperforming previous GPT models.

Model performance is influenced more by its inherent capabilities than by prompting methods or media elements inclusion.

Further assessments against trusted benchmarks are essential for the effective integration of LLM into healthcare.

Details

Title
How GPT models perform on the United States medical licensing examination: a systematic review
Author
Brin, Dana 1 ; Sorin, Vera 2 ; Konen, Eli 1 ; Nadkarni, Girish 3 ; Glicksberg, Benjamin S. 4 ; Klang, Eyal 5 

 Chaim Sheba Medical Center, Department of Diagnostic Imaging, Ramat Gan, Israel (GRID:grid.413795.d) (ISNI:0000 0001 2107 2845); Tel-Aviv University, The Faculty of Medicine, Tel Aviv, Israel (GRID:grid.12136.37) (ISNI:0000 0004 1937 0546) 
 Chaim Sheba Medical Center, Department of Diagnostic Imaging, Ramat Gan, Israel (GRID:grid.413795.d) (ISNI:0000 0001 2107 2845); Tel-Aviv University, The Faculty of Medicine, Tel Aviv, Israel (GRID:grid.12136.37) (ISNI:0000 0004 1937 0546); Chaim Sheba Medical Center, DeepVision Lab, Ramat Gan, Israel (GRID:grid.413795.d) (ISNI:0000 0001 2107 2845) 
 Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, USA (GRID:grid.59734.3c) (ISNI:0000 0001 0670 2351); The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, USA (GRID:grid.59734.3c) (ISNI:0000 0001 0670 2351) 
 Hasso Plattner Institute for Digital Health, Icahn School of Medicine at Mount Sinai, New York, USA (GRID:grid.59734.3c) (ISNI:0000 0001 0670 2351) 
 Chaim Sheba Medical Center, Department of Diagnostic Imaging, Ramat Gan, Israel (GRID:grid.413795.d) (ISNI:0000 0001 2107 2845); The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, USA (GRID:grid.59734.3c) (ISNI:0000 0001 0670 2351) 
Pages
500
Publication year
2024
Publication date
Oct 2024
Publisher
Springer Nature B.V.
ISSN
25233963
e-ISSN
25233971
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3106551828
Copyright
© The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.