Text-Guided Diffusion Based Ambiguous Medical Image Segmentation

TMLR Paper6269 Authors

21 Oct 2025 (modified: 25 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Medical image segmentation often suffers from ambiguity due to unclear boundaries, expert inconsistencies, and varying interpretation standards. Traditional segmentation models produce single deterministic outputs, failing to capture this uncertainty and the range of plausible interpretations in such cases. To address this, we introduce AmbiguousTextDiff, a novel text-guided diffusion model that generates diverse and plausible segmentation proposals reflecting the ambiguity observed in medical imaging. By combining the strengths of text- conditional diffusion models with ambiguity-aware training, our approach generates multiple valid segmentations for a single input image. We use descriptive text prompts including anatomical and diagnostic attributes as conditioning signals to guide segmentation. We generate these prompts by extracting detailed metadata from the LIDC-IDRI dataset such as nodule size, texture, spiculation, and malignancy. This text-based conditioning improves both the controllability and clinical relevance of the model’s outputs, aligning them more closely with radiologist interpretation. Extensive evaluations and ablations on the LIDC-IDRI dataset demonstrate that AmbiguousTextDiff achieves superior performance across Combined Sensitivity, Diversity Agreement, Generalized Energy Distance (GED), and Collective Insight (CI) Score offering a comprehensive measure of both accuracy and uncertainty. Our results highlight the value of text-guided diffusion for ambiguity-aware segmentation and establish a new direction for controllable and interpretable medical image analysis.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~changjian_shui1
Submission Number: 6269
Loading