LangMedSAM: Scalable Adaptation of Medical Segment Anything Model (MedSAM) for Language-Prompted Medical Image Segmentation

ICLR 2026 Conference Submission21360 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Medical Image Computing, Image Segmentation, Foundational Model
TL;DR: We propose LangMedSAM, a multi-modal segmentation model that uses natural language prompts to generate anatomical and pathological masks, reducing dependence on manual bounding boxes while maintaining strong CT and MR performance.
Abstract: Image segmentation is a crucial component of medical imaging, facilitating precise analysis and diagnosis by identifying anomalies and structures across various imaging modalities. Recent advancements have led to the development of foundational medical image segmentation models such as MedSAM. Trained on a large corpus of medical images, MedSAM generates segmentation masks based on user prompts such as bounding boxes and points. For faster inference, LiteMedSAM, a lightweight variant of MedSAM, offers a computationally more practical solution, while maintaining comparable performance. However, manually providing bounding boxes for each 2D slice in volumetric imaging remains cumbersome and hinders the automatic processing of large datasets. To address this, we introduce LangMedSAM, a multi-modal text-based segmentation model that leverages natural language prompts for mask generation in radiological images. LangMedSAM is trained on 20 publicly available medical datasets and evaluated both on these datasets and on 4 additional external datasets to assess generalizability. Building on LiteMedSAM’s architecture, it supports segmentation via both text-based prompts and conventional inputs such as bounding boxes. Our results show that text-based prompts provide a scalable and effective solution for multi-modal and multi-region medical image segmentation, offering a practical alternative to conventional prompting methods in MedSAM—particularly for the automated processing of large collections of scans.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 21360
Loading