LGA: A Language Guide Adapter for Advancing the SAM Model's Capabilities in Medical Image Segmentation

Jihong Hu, Yinhao Li, Hao Sun, Yu Song, Chujie Zhang, Lanfen Lin, Yen-Wei Chen

Published: 01 Jan 2024, Last Modified: 11 Apr 2025MICCAI (12) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In addressing the unique challenges of medical image segmentation, foundation models like the Segment Anything Model (SAM), originally developed for natural image, often falter due to the distinct nature of medical images. This study introduces the Language Guide Adapter (LGA), a paremeter efficient fine-tuning approach that extends SAM’s utility to medical segmentation tasks. Through the integration of textual data from medical reports via a pretrained Bert model into embeddings, LGA combines these embeddings with the image features in SAM’s image encoder using Feature Fusion Modules (FFM). Our method significantly enhances model performance and reduces computational overhead by freezing most parameters during the fine-tuning process. Evaluated on the CT-based MosMedData+ and the X-ray dataset QaTa-COV19, LGA demonstrates its effectiveness and adaptability, achieving competitive results with a significant reduction in the number of parameters required for fine-tuning compared to SOTA medical segmentation models. This enhancement underscores the potential of foundation models, leveraging the integration of multimodal knowledge as a pivotal approach for application in specialized medical tasks, thus charting a course towards more precise and adaptable diagnostic methodologies. The code is avaliable at https://github.com/JiHooooo/LGA.