Abstract

This study demonstrates Large Language models (LLMs) to assess and coach surgeons on their non-technical skills, traditionally evaluated through subjective and resource-intensive methods. Llama 3.1 and Mistral effectively analyzed robotic-assisted surgery transcripts, identified exemplar and non-exemplar behaviors, and autonomously generated structured coaching feedback to guide surgeons’ improvement. Our findings highlight the potential of LLMs as scalable, data-driven tools for enhancing surgical education and supporting consistent coaching practices.

Details

Title
Feasibility of large language models for assessing and coaching surgeons’ non-technical skills
Author
Obuseh, Marian 1 ; Singh, Sneha 2 ; Anton, Nicholas E. 3 ; Gardiner, Robin 3 ; Stefanidis, Dimitrios 3 ; Yu, Denny 1 

 Purdue University, Edwardson School of Industrial Engineering, West Lafayette, USA (GRID:grid.169077.e) (ISNI:0000 0004 1937 2197) 
 Indian Institute of Technology, Koita Centre for Digital Health, Bombay, India (GRID:grid.467228.d) (ISNI:0000 0004 1806 4045) 
 Indiana University, School of Medicine, Indianapolis, USA (GRID:grid.257413.6) (ISNI:0000 0001 2287 3919) 
Pages
25
Publication year
2025
Publication date
Dec 2025
Publisher
Nature Publishing Group
e-ISSN
30051959
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3230339773
Copyright
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.