SurgPETL: Parameter-Efficient Image-to-Surgical-Video Transfer Learning for Surgical Phase Recognition

Shu Yang, Zhiyuan Cai, Luyang Luo, Ning Ma, Shuchang Xu, Hao Chen

Published: 01 Jan 2025, Last Modified: 09 Nov 2025IEEE Transactions on Medical ImagingEveryoneRevisionsCC BY-SA 4.0
Abstract: Capitalizing on image-level pre-trained models for various downstream tasks has recently emerged with promising performance. However, the paradigm of “image pre-training followed by video fine-tuning” for high-dimensional video data inevitably introduces significant performance bottlenecks. Furthermore, in the medical domain, many surgical video tasks encounter additional challenges posed by the limited availability of video data and the necessity for comprehensive spatiotemporal modeling. Recently, Parameter-Efficient Image-to-Video Transfer Learning (PEIVTL) has emerged as an efficient and effective paradigm for video action recognition tasks, which employs image-level pre-trained models with promising feature transferability and involves cross-modality temporal modeling with minimal fine-tuning. Nevertheless, the effectiveness and generalizability of this paradigm within intricate surgical domain remain unexplored. In this paper, we delve into a novel problem of efficiently adapting image-level pre-trained models to specialize in fine-grained surgical phase recognition, termed Parameter-Efficient Image-to-Surgical-Video Transfer Learning. First, we develop SurgPETL, a parameter-efficient transfer learning framework for surgical phase recognition, and conduct extensive experiments with three advanced methods based on ViTs of two distinct scales pre-trained on five large-scale natural and medical datasets. Then, we introduce the Adaptive Spatiotemporal Representation Modulation (ASRM) module, integrating a standard spatial adapter with a novel temporal adapter to capture detailed spatial features and establish connections across temporal sequences for robust spatiotemporal modeling. Extensive experiments on three challenging datasets spanning various surgical procedures demonstrate the effectiveness of SurgPETL with ASRM. SurgPETL-ASRM outperforms both parameter-efficient alternatives and state-of-the-art surgical phase recognition methods while maintaining parameter efficiency and minimizing overhead.
Loading