Content area
Modern surgical training, guided by competency-based medical education, demands frequent assessment and feedback to support continuous skill development. Many early-stage practices have moved to simulation-based settings, where trainees develop technical skills outside the operating room. This shift has placed additional demands on expert instructors, whose time is now split between clinical duties and trainee supervision. Automated video analysis offers a scalable alternative to expert observation, enabling objective evaluation while reducing labor and cost. To deliver actionable feedback, it is important to understand the procedural context of surgical activity, which requires accurate recognition of surgical workflow. Skill assessment can then be aligned with the procedural context.
We propose incorporating workflow analysis as a prerequisite for skill assessment. Recent advances in deep learning have enabled accurate modeling of surgical workflow, approximating expert-level procedural understanding. However, training deep learning models often rely on large volumes of annotated surgical videos. In practice, such data is scarce, and when available, varies widely in surgical environment, visual appearance, and annotation schemes. This challenges models to learn generalizable features. The goal of this thesis is the accurate recognition of surgical workflow for skill assessment while contending with data scarcity.
We present an automated video analysis framework for surgical workflow recognition and skill assessment in simulation-based training. We develop a deep learning model for real-time workflow recognition in simulated cataract surgery, segmenting procedures into discrete tasks. The resulting task durations serve as interpretable metrics of technical proficiency and are shown to correlate with surgeon expertise. To overcome data scarcity and improve generalization, we introduce a cross-domain self-supervised learning strategy that pre-trains models on unlabeled surgical videos from both clinical and simulation domains. This approach incorporates clinically relevant context and improves performance in low-data settings. Its validity is further demonstrated on robot-assisted surgical suturing. The proposed method consistently outperforms standard baselines and supervised pre-training, particularly under visual and semantic domain misalignment.
In conclusion, this thesis demonstrates the potential of surgical workflow recognition for skill assessment and the effectiveness of cross-domain pre-training. These contributions support the development of context-aware coaching systems that generalize to broader surgical training scenarios.
Details
Cataracts;
Surgeons;
Medical education;
Deep learning;
Sutures;
Computer vision;
Eye surgery;
Video recordings;
Neural networks;
Laparoscopy;
Virtual reality;
Business metrics;
Skills;
Artificial intelligence;
Film studies;
Health education;
Higher education;
Information technology;
Medical personnel;
Medicine;
Ophthalmology;
Surgery