Abstract: Medical image segmentation often suffers from ambiguity due to unclear boundaries, expert inconsistencies, and varying interpretation standards. Traditional segmentation models produce single deterministic outputs, failing to capture this uncertainty and the range of plausible interpretations in such cases. In this work, we introduce AmbiguousTextDiff, a novel text-guided diffusion model that generates diverse and plausible segmentation proposals reflecting the ambiguity observed in medical imaging. By combining the strengths of text-conditional diffusion models with ambiguity-aware training, our approach generates multiple valid segmentations for a single input image. We use descriptive text prompts incorporating anatomical, morphological, and diagnostic attributes as conditioning signals to guide segmentation. These prompts are generated by extracting clinical metadata from two diverse sources: the LIDC-IDRI lung nodule dataset (e.g., texture, spiculation, malignancy) and the IMA++ skin lesion dataset (e.g., anatomical site, pathology). This text-based conditioning improves both the controllability and clinical relevance of the model’s outputs, aligning them more closely with radiologist interpretation. Extensive evaluations and ablations on both datasets demonstrate that AmbiguousTextDiff achieves superior performance across Combined Sensitivity, Diversity Agreement, Generalized Energy Distance (GED), and Collective Insight (CI) Score. Our results highlight the value of text-guided diffusion for ambiguity-aware segmentation across multiple imaging modalities and establish a new direction for controllable and interpretable medical image analysis.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~changjian_shui1
Submission Number: 6269
Loading