ExpertsMedSAM: Faster Medical Image Segment Anything with Mixture-of-Experts

Li Zhi, Yaqi Wang, Shuai Wang

Published: 01 Jan 2025, Last Modified: 05 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: The Segment Anything Model (SAM) demonstrates remarkable performance in image segmentation but is limited by its large ViT-H encoder, restricting deployment on resource-constrained devices. LiteMedSAM addresses this by incorporating compact encoders like Tiny-ViT, reducing parameters while maintaining performance. However, it underperforms in complex cases such as funduscopic image segmentation using Scribble-Prompt. Scribble-Prompt allows detailed annotations suitable for small or intricate structures but lacks mature optimization strategies in medical image segmentation. To enhance performance in challenging modalities like funduscopic images, we propose ExpertsMedSAM-a multi-expert fusion model integrating Tiny-ViT with a Scribble-Guided Mask Decoder. This approach employs a hybrid multi-expert training strategy and an efficient output fusion method, significantly improving segmentation under Scribble-Prompt conditions while maintaining stability across other modalities. Experimental results show substantial improvements over baseline models. The code is available at https://github.com/RicoLeehdu/ExpertsMedSAM.git..

External IDs:doi:10.1007/978-3-031-81854-7_8