Abstract: Recent adaptations of the powerful and promptable Segment Anything Model (SAM), pretrained on a large-scale dataset, have shown promising results in medical image segmentation. However, existing methods fail to fully leverage the intermediate features from SAM’s image encoder, limiting its adaptability. To address this, we introduce MoE-SAM, a novel approach that enhances SAM by incorporating a Mixture-of-Experts (MoE) during adaptation. Central to MoE-SAM is a MoE-driven feature enhancing block, which uses learnable gating functions and expert networks to select, refine, and fuse latent features from multiple layers of SAM’s image encoder. By combining these features, the model creates a more robust image embedding that captures both low-level local and high-level global information. This comprehensive embedding facilitates prompt embedding generation and mask decoding, thereby enabling more effective self-prompting segmentation. Extensive evaluations across four benchmark medical image segmentation tasks show that MoE-SAM outperforms both task-specialized models and other SAM-based approaches, achieving state-of-the-art segmentation accuracy. The code is available at: https://github.com/Asphyxiate-Rye/E-SAM.
External IDs:dblp:conf/miccai/LiWGXCCB25
Loading