SPG-SAM: Semantic Prompt Graph Learning for Multi-class Medical Image Segmentation

ICLR 2026 Conference Submission22657 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Medical Image Segmentation, Segment Anything Model, Semantic Prompt Graph, Multi-class Segmentation, Graph Attention Network
TL;DR: We propose ​SPG-SAM, a ​Semantic Prompt Graph​-enhanced SAM framework for ​multi-class medical image segmentation, leveraging ​anatomical relationships​ and ​graph attention​ to improve segmentation accuracy.
Abstract: Existing visual foundation model-based methods (e.g., SAM) for multi-class medical image segmentation typically face a trade-off between insufficient semantic information and spatial prompt interference, while extending SAM with fully automated semantic segmentation compromises its inherent interactive prompting capabilities. To bridge the semantic specificity gap, we propose SPG-SAM (Semantic Prompt Graph learning for SAM), a novel framework that seamlessly integrates spatial and semantic prompting for efficient and accurate multi-class medical image segmentation. SPG-SAM introduces dedicated semantic prompts to complement SAM’s spatial prompts, establishing an explicit mapping between object locations and semantic categories. Furthermore, we introduce a semantic prompt graph learning module that employs a graph attention network to explicitly model anatomical priors and structural relationships among medical objects. This design enables cross-category feature interaction, mitigates prompt interference, and facilitates accurate and efficient multi-class segmentation within the SAM-based paradigm. Experimental results demonstrate that SPG-SAM achieves average Dice coefficients of 94.27\% and 91.83\% on the abdominal multi-organ segmentation (BTCV) and pelvic target segmentation (PelvicRT) tasks, respectively, outperforming the second-best state-of-the-art baselines by 2.1\% and 3.65\%. The code will be available.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 22657
Loading