SPHASE: Multi-Modal and Multi-Branch Surgical Phase Segmentation Framework based on Temporal Convolutional Network

Published: 01 Jan 2023, Last Modified: 24 Jun 2025BIBM 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Surgical phase segmentation plays an important role in computer-assisted surgery systems, aiming to recognize what step or what action is operating in the video frame. Existing methods focus on improving the accuracy and precision of video segmentation, but ignore semantic consistency and temporal continuity of video frames in the intra-phase, which is necessary to apply in realistic computer-assisted equipment. Meanwhile, recent works almost extract long-term dependencies by Temporal Convolutional Network, but we heed high layers in TCN lose fine-grained information for detecting surgical steps and further affect phase segmentation task. To address these problems, we propose a Surgical Phase Segmentation Framework (SPHASE) which contains a multimodal feature fusion process and follows a multi-branch predictor. Moreover, we design a multimodal feature fusion mechanism when aggregate optical flow feature and I3D feature. The extensive experiments on AutoLaparo, Cholec80, and M2CAI2016 datasets demonstrate our method outperforms the state-of-the-art method by a large margin, especially in the JACC metric, which means SPHASE is more applicable in the surgical operating room.
Loading