Keywords: MedSAM, Rep-ViT, Medical Images
Abstract: Medical image segmentation has been a pivotal step in clinical practice, enabling more precise analysis of medical images. MedSAM, as a medical image segmentation foundation model, has significantly extended the ability of SAM to segment a broad spectrum of different modalities of medical images and achieves excellent performance comparing specialist models. However, with a heavy image encoder, MedSAM falls short of clinical usage in terms of time efficiency. Therefore, the CVPR 2024: Segment Anything In Medical Images On Laptop Challenge addresses performance and efficiency in a task, where the model infers with only CPU. To this end, we propose Rep-MedSAM, which integrates RepViT, a mobile-friendly CNN with efficient designs of lightweight ViTs, by replacing the image encoder in MedSAM. Our method is simple but effective, including knowledge distillation from pretrained MedSAM, whole-pipeline training and fine-tuning with extra datasets. We conduct all experiments on the challenge. Our method achieved an average DSC of $85.90\%$ and an average NSD of $87.07\%$ on validation. As for time cost, our method shows thrilling results compared to the baseline on validation. The average time for 2D and 3D cases is $0.47$s and $22.47$s, respectively, with an average of $2.41$s for each case. Our code is available at https://github.com/mxWe1/CVPR24-Challenge.
Submission Number: 17
Loading