Keywords: SegFormer, Surgical tools, Segmentation
Abstract: As robotic minimally invasive surgeries gain popularity, enhancing surgical training with deep learning solutions has become a focal point for research. A thorough understanding of the surgical scene components is crucial, which semantic segmentation models can help achieve. Most existing work focuses on surgical tools, often overlooking anatomical objects. Additionally, current state-of-the-art (SOTA) models struggle to balance capturing high-level contextual features and low-level edge features. We present a Feature-Adaptive Spatial Localization model (FASL-Seg), designed to capture features at varying levels of detail effectively. FASL-Seg utilizes two distinct processing streams, namely a Low-Level Feature Projection (LLFP) and a High-Level Feature Projection (HLFP) stream, for varying feature resolutions - enabling precise segmentation of anatomy and surgical instruments. We evaluated FASL-Seg on surgical segmentation benchmark datasets EndoVis18 and EndoVis17 on three use cases. The FASL-Seg model achieves a mean Intersection over Union (mIoU) of 72.71% on parts and anatomy segmentation in EndoVis18, improving on SOTA by 5%. It further achieves a mIoU of 85.61% and 72.78% in EndoVis18 and EndoVis17 tool type segmentation, respectively, outperforming SOTA overall performance, with comparable per-class SOTA results in both datasets and consistent performance in various classes for anatomy and instruments, demonstrating the effectiveness of distinct processing streams for varying feature resolutions.
Track: Track 2: ML by Muslim Authors
Submission Number: 17
Loading