ECT-3DMedSAM: Efficient Cross Teaching Using Segment Anything Model for Semi-Supervised 3D Medical Image Segmentation
Keywords: Semi-supervised learning, 3D image segmentation, Transformer, Segment Anything Model, Multi-modalities
Abstract: The advent of foundation models has established new benchmarks in volumetric medical image segmentation by leveraging large-scale pre-training and temporal context. However, effectively adapting these data-hungry models to downstream tasks with limited annotations remains a critical bottleneck. Standard Semi-Supervised Medical Image Segmentation (SSMIS) approaches typically rely on conventional CNNs, which lack the semantic generalization capabilities required for complex 3D anatomical structures. In this paper, we propose a novel cross-teaching framework tailored for the efficient adaptation of the 3D foundation model (MedSAM-2). We introduce a parameter-efficient design that shares frozen image and prompt encoders between two parallel, Low-Rank Adaptation (LoRA) learnable mask decoders. Furthermore, we replace the memory-intensive attention mechanism with a simplified temporal propagation module for reducing the memory consumption while maintaining critical local volumetric coherence. Our model processes the same input volume through weakly and strongly augmentations to create a synergistic learning loop where the two decoders mutually supervise each other. We validate our method across three distinct datasets and modalities. Experimental results demonstrate that our framework effectively bridges the domain gap across different modalities and improve segmentation accuracy, especially in boundaries precision compared to existing baselines.
Primary Subject Area: Segmentation
Secondary Subject Area: Learning with Noisy Labels and Limited Data
Registration Requirement: Yes
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 169
Loading