Modality-Specific Strategies for Medical Image Segmentation using Lightweight SAM Architectures

Thuy Thanh Dao; Xincheng Ye; Joshua Scarsbrook; Gowrienanthan Balarupan; Fernanda Lenita Ribeiro; Steffen Bollmann

Modality-Specific Strategies for Medical Image Segmentation using Lightweight SAM Architectures

Thuy Thanh Dao, Xincheng Ye, Joshua Scarsbrook, Gowrienanthan Balarupan, Fernanda Lenita Ribeiro, Steffen Bollmann

Published: 11 Oct 2024, Last Modified: 11 Oct 2024CVPR24 MedSAMonLaptopEveryoneRevisionsBibTeXCC BY-SA 4.0

Keywords: Multi-modality, Zero-shot, OpenVINO, CPU Deployment

TL;DR: We optimized medical image segmentation on CPU, customizing lightweight models per modality, and enhancing efficiency using OpenVINO format.

Abstract: Medical image segmentation tasks are often intricate and require medical domain expertise. Recent advancements in deep learning have expedited these demanding tasks, transitioning from specialized models tailored to each task to versatile foundation models capable of accommodating various image modalities. However, many of these foundation models are optimized for GPU computation, necessitating significant computational resources and constraining their practical utility in clinical settings. Furthermore, their variable accuracy across modalities and novel domains undermines their reliability in clinical practice. To address these limitations, we undertake a comparative investigation into deploying medical image segmentation models on CPU, focusing on accuracy and runtime efficiency, as part of the "CVPR 2024: Segment Anything In Medical Images On Laptop'' challenge. Our methodology employs different models customized for each modality, including pre-trained EfficientViT-SAM and LiteMedSAM to yield the most precise and efficient outcomes. Additionally, to bolster model performance for datasets featuring small regions of interest, such as PET scans, we integrate a majority voting mechanism. We optimize runtime using the OpenVINO format within a C++ inference script. This approach improves inference runtime while maintaining competitive accuracy, achieving an average DSC score of 0.86 on the validation set and 0.75 on the testing set with an average runtime of 4.61s on testing set. Notably, given that most modalities are evaluated in a zero-shot manner, our findings suggest that the zero-shot capability of foundation models can be further refined through dataset-specific inference strategies.

Submission Number: 6

Loading