Keywords: RepViT, Segment Anything, SAM, Medical Image Segmentation, Image Segmentation, Efficient Neural Networks, CNN, SAM and RepViT
TL;DR: We propose an efficient model for the segment anything in medical images problem, RepViT- MedSAM, created by distilling a the original MedSAM image encoder into a RepViT backbone.
Abstract: Segmenting medical images to identify lesions, organs and other areas of interest is crucial for diagnosis and treatment decisions. Traditionally, segmentation is accomplished through manual tools or using automated task-specific neural network models. A promising alternative solution to this problem is to create general-purpose models for segment anything in medical images, such as MedSAM~\cite{MedSAM}. These foundation models can segment regions across a multitude of modalities, at levels comparable to task-specific models. However, these models are often large and computationally expensive, preventing them from being used in clinical settings where they lack dedicated GPUs. We propose an efficient model for the segment anything in medical images problem, RepViT-MedSAM, created from a two step training process. First, the image encoder of MedSAM is distilled into a more efficient RepViT feature detector using aggressively augmented medical images. Then the entire end-to-end model, with the prompt encoder and mask decoder, is fine-tuned using ground truth masks and MedSAM's predictions. On the test set, RepViT-MedSAM surpasses the performance of baseline MedSAM in performance and efficiency, achieving an average Dice Similarity Coefficient (DSC) of 0.8528, an average Normalized Surface Distance (NSD) of 0.8666, taking a total execution time of 195 seconds, and ranking 12/23 among other contestants. RepViT-SAM offers a promising solution for real-world medical image segmentation with its efficiency and accuracy. The code for this project is available at https://github.com/icecap360/TurboMedSAM.
Submission Number: 8
Loading