Leveraging Graph Attention Networks for Targeted Feedback from Operating Room Surgery Videos

Mingze Xia, Nisarg A. Shah, Vishal M. Patel, S. Swaroop Vedula, Shameema Sikder

Published: 01 Jan 2025, Last Modified: 11 Nov 2025ISBI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Providing targeted feedback in cataract surgery is essential for refining surgical techniques and supporting skill development. This paper introduces a framework for generating targeted feedback based on specific procedural steps in cataract surgery videos, using a specialized feedback catalog tailored to assess critical surgical actions. Our approach employs a Video Masked Autoencoder (VideoMAE) as the feature extractor, enhanced with a Graph Attention Network (GAT) to capture inter-label dependencies. This framework achieves an AUC of 0.839 on a cataract surgery video dataset, improving classification accuracy and specificity across multiple feedback criteria compared with various other methods. Our findings demonstrate its effectiveness in delivering structured, context-aware feedback, and highlight the potential of GAT-based architectures in advancing targeted feedback generation for surgical procedures.

External IDs:dblp:conf/isbi/XiaSPVS25