Full text

Turn on search term navigation

© 2025. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Background:Ocular myasthenia gravis (OMG) is a neuromuscular disorder primarily affecting the extraocular muscles, leading to ptosis and diplopia. Effective patient education is crucial for disease management; however, in China, limited health care resources often restrict patients’ access to personalized medical guidance. Large language models (LLMs) have emerged as potential tools to bridge this gap by providing instant, AI-driven health information. However, their accuracy and readability in educating patients with OMG remain uncertain.

Objective:The purpose of this study was to systematically evaluate the effectiveness of multiple LLMs in the education of Chinese patients with OMG. Specifically, the validity of these models in answering patients with OMG-related questions was assessed through accuracy, completeness, readability, usefulness, and safety, and patients’ ratings of their usability and readability were analyzed.

Methods:The study was conducted in two phases: 130 choice ophthalmology examination questions were input into 5 different LLMs. Their performance was compared with that of undergraduates, master’s students, and ophthalmology residents. In addition, 23 common patients with OMG-related patient questions were posed to 4 LLMs, and their responses were evaluated by ophthalmologists across 5 domains. In the second phase, 20 patients with OMG interacted with the 2 LLMs from the first phase, each asking 3 questions. Patients assessed the responses for satisfaction and readability, while ophthalmologists evaluated the responses again using the 5 domains.

Results:ChatGPT o1-preview achieved the highest accuracy rate of 73% on 130 ophthalmology examination questions, outperforming other LLMs and professional groups like undergraduates and master’s students. For 23 common patients with OMG-related questions, ChatGPT o1-preview scored highest in correctness (4.44), completeness (4.44), helpfulness (4.47), and safety (4.6). GEMINI (Google DeepMind) provided the easiest-to-understand responses in readability assessments, while GPT-4o had the most complex responses, suitable for readers with higher education levels. In the second phase with 20 patients with OMG, ChatGPT o1-preview received higher satisfaction scores than Ernie 3.5 (Baidu; 4.40 vs 3.89, P=.002), although Ernie 3.5’s responses were slightly more readable (4.31 vs 4.03, P=.01).

Conclusions:LLMs such as ChatGPT o1-preview may have the potential to enhance patient education. Addressing challenges such as misinformation risk, readability issues, and ethical considerations is crucial for their effective and safe integration into clinical practice.

Details

Title
Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study
Author
Wei, Bin  VIAFID ORCID Logo  ; Yao, Lili  VIAFID ORCID Logo  ; Hu, Xin  VIAFID ORCID Logo  ; Hu, Yuxiang  VIAFID ORCID Logo  ; Rao, Jie  VIAFID ORCID Logo  ; Yu, Ji  VIAFID ORCID Logo  ; Dong, Zhuoer  VIAFID ORCID Logo  ; Duan, Yichong  VIAFID ORCID Logo  ; Wu, Xiaorong  VIAFID ORCID Logo 
First page
e67883
Section
e-Learning and Digital Medical Education
Publication year
2025
Publication date
2025
Publisher
Gunther Eysenbach MD MPH, Associate Professor
e-ISSN
1438-8871
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3222369576
Copyright
© 2025. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.