Content area

Abstract

Modern surgical training, guided by competency-based medical education, demands frequent assessment and feedback to support continuous skill development. Many early-stage practices have moved to simulation-based settings, where trainees develop technical skills outside the operating room. This shift has placed additional demands on expert instructors, whose time is now split between clinical duties and trainee supervision. Automated video analysis offers a scalable alternative to expert observation, enabling objective evaluation while reducing labor and cost. To deliver actionable feedback, it is important to understand the procedural context of surgical activity, which requires accurate recognition of surgical workflow. Skill assessment can then be aligned with the procedural context.

We propose incorporating workflow analysis as a prerequisite for skill assessment. Recent advances in deep learning have enabled accurate modeling of surgical workflow, approximating expert-level procedural understanding. However, training deep learning models often rely on large volumes of annotated surgical videos. In practice, such data is scarce, and when available, varies widely in surgical environment, visual appearance, and annotation schemes. This challenges models to learn generalizable features. The goal of this thesis is the accurate recognition of surgical workflow for skill assessment while contending with data scarcity.

We present an automated video analysis framework for surgical workflow recognition and skill assessment in simulation-based training. We develop a deep learning model for real-time workflow recognition in simulated cataract surgery, segmenting procedures into discrete tasks. The resulting task durations serve as interpretable metrics of technical proficiency and are shown to correlate with surgeon expertise. To overcome data scarcity and improve generalization, we introduce a cross-domain self-supervised learning strategy that pre-trains models on unlabeled surgical videos from both clinical and simulation domains. This approach incorporates clinically relevant context and improves performance in low-data settings. Its validity is further demonstrated on robot-assisted surgical suturing. The proposed method consistently outperforms standard baselines and supervised pre-training, particularly under visual and semantic domain misalignment.

In conclusion, this thesis demonstrates the potential of surgical workflow recognition for skill assessment and the effectiveness of cross-domain pre-training. These contributions support the development of context-aware coaching systems that generalize to broader surgical training scenarios.

Details

1010268
Business indexing term
Title
Automated Video Analysis with Deep Learning for Surgical Training
Number of pages
83
Publication year
2025
Degree date
2025
School code
0283
Source
MAI 87/6(E), Masters Abstracts International
ISBN
9798270206154
University/institution
Queen's University (Canada)
University location
Canada -- Ontario, CA
Degree
M.S.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32353519
ProQuest document ID
3283374066
Document URL
https://www.proquest.com/dissertations-theses/automated-video-analysis-with-deep-learning/docview/3283374066/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic