Abstract: Providing targeted feedback in cataract surgery is essential for refining surgical techniques and supporting skill development. This paper introduces a framework for generating targeted feedback based on specific procedural steps in cataract surgery videos, using a specialized feedback catalog tailored to assess critical surgical actions. Our approach employs a Video Masked Autoencoder (VideoMAE) as the feature extractor, enhanced with a Graph Attention Network (GAT) to capture inter-label dependencies. This framework achieves an AUC of 0.839 on a cataract surgery video dataset, improving classification accuracy and specificity across multiple feedback criteria compared with various other methods. Our findings demonstrate its effectiveness in delivering structured, context-aware feedback, and highlight the potential of GAT-based architectures in advancing targeted feedback generation for surgical procedures.
External IDs:dblp:conf/isbi/XiaSPVS25
Loading