P^2SAM: Probabilistically Prompted SAMs Are Efficient Segmentator for Ambiguous Medical Images

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The ability to predict multiple potential outputs for a single input can significantly address visual ambiguity, such as diverse semantic segmentation annotations for a medical image provided by different experts. Existing methods employ various advanced probabilistic modeling techniques to model the ambiguous prediction, while they often struggle to fit the underlying distribution for multiple outputs when only a limited number of ambiguously labeled data is available, which is usually the case in real-world applications. To overcome the challenges, we propose a framework that leverages the prior knowledge from foundation models during segmenting ambiguous objects., termed as P² SAM. We delve into an inherent disadvantage of SAM, i.e., the sensitivity of the output to prompts, and ingeniously transform it into an advantage on ambiguous segmentation in turn by introducing a prompt generation module. Experimental results demonstrate that by utilizing only a small number of doctor-annotated ambiguous samples, our strategy significantly enhances the precision and diversity for medical segmentation. In rigorous benchmarking experiments against cutting-edge methods, our method achieves increased segmentation precision and diversified outputs with even fewer training data (5.5% sample, +12% $D_{max}$). P² SAM signifies a steady step towards the practical deployment of probabilistic models in real-world data-limited scenarios.
Relevance To Conference: This work contributes to multimedia/multimodal processing by developing a novel framework that can generate a range of plausible outputs for a single input. This is particularly useful in scenarios where there is inherent ambiguity, such as when multiple experts provide differing semantic segmentation annotations for a single medical image. The proposed framework leverages prior knowledge from the Segment Anything Model (SAM) and introduces a prior probabilistic space for prompts, transforming a key limitation of SAM into an advantage for handling ambiguous segmentation tasks. This allows for significant enhancements in the precision and diversity of medical segmentation, even when only a limited amount of ambiguously labeled data is available. This research is a significant step towards the practical deployment of probabilistic models in real-world scenarios with limited data, which is a common challenge in multimedia/multimodal processing.
Supplementary Material: zip
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Generation] Multimedia Foundation Models
Submission Number: 1295
Loading