Abstract
Objective
The United States Medical Licensing Examination (USMLE) assesses physicians' competency. Passing this exam is required to practice medicine in the U.S. With the emergence of large language models (LLMs) like ChatGPT and GPT-4, understanding their performance on these exams illuminates their potential in medical education and healthcare.
Materials and methods
A PubMed literature search following the 2020 PRISMA guidelines was conducted, focusing on studies using official USMLE questions and GPT models.
Results
Six relevant studies were found out of 19 screened, with GPT-4 showcasing the highest accuracy rates of 80–100% on the USMLE. Open-ended prompts typically outperformed multiple-choice ones, with 5-shot prompting slightly edging out zero-shot.
Conclusion
LLMs, especially GPT-4, display proficiency in tackling USMLE questions. As AI integrates further into healthcare, ongoing assessments against trusted benchmarks are essential.
Article Highlights
GPT-4 showed accuracy rates of 80–90% on the USMLE, outperforming previous GPT models.
Model performance is influenced more by its inherent capabilities than by prompting methods or media elements inclusion.
Further assessments against trusted benchmarks are essential for the effective integration of LLM into healthcare.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 Chaim Sheba Medical Center, Department of Diagnostic Imaging, Ramat Gan, Israel (GRID:grid.413795.d) (ISNI:0000 0001 2107 2845); Tel-Aviv University, The Faculty of Medicine, Tel Aviv, Israel (GRID:grid.12136.37) (ISNI:0000 0004 1937 0546)
2 Chaim Sheba Medical Center, Department of Diagnostic Imaging, Ramat Gan, Israel (GRID:grid.413795.d) (ISNI:0000 0001 2107 2845); Tel-Aviv University, The Faculty of Medicine, Tel Aviv, Israel (GRID:grid.12136.37) (ISNI:0000 0004 1937 0546); Chaim Sheba Medical Center, DeepVision Lab, Ramat Gan, Israel (GRID:grid.413795.d) (ISNI:0000 0001 2107 2845)
3 Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, USA (GRID:grid.59734.3c) (ISNI:0000 0001 0670 2351); The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, USA (GRID:grid.59734.3c) (ISNI:0000 0001 0670 2351)
4 Hasso Plattner Institute for Digital Health, Icahn School of Medicine at Mount Sinai, New York, USA (GRID:grid.59734.3c) (ISNI:0000 0001 0670 2351)
5 Chaim Sheba Medical Center, Department of Diagnostic Imaging, Ramat Gan, Israel (GRID:grid.413795.d) (ISNI:0000 0001 2107 2845); The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, USA (GRID:grid.59734.3c) (ISNI:0000 0001 0670 2351)





