Abstract: Medical image segmentation often suffers from ambiguity due to unclear boundaries, expert
inconsistencies, and varying interpretation standards. Traditional segmentation models
produce single deterministic outputs, failing to capture this uncertainty and the range of
plausible interpretations in such cases. To address this, we introduce AmbiguousTextDiff, a
novel text-guided diffusion model that generates diverse and plausible segmentation proposals
reflecting the ambiguity observed in medical imaging. By combining the strengths of text-
conditional diffusion models with ambiguity-aware training, our approach generates multiple
valid segmentations for a single input image. We use descriptive text prompts including
anatomical and diagnostic attributes as conditioning signals to guide segmentation. We
generate these prompts by extracting detailed metadata from the LIDC-IDRI dataset such
as nodule size, texture, spiculation, and malignancy. This text-based conditioning improves
both the controllability and clinical relevance of the model’s outputs, aligning them more
closely with radiologist interpretation. Extensive evaluations and ablations on the LIDC-IDRI
dataset demonstrate that AmbiguousTextDiff achieves superior performance across Combined
Sensitivity, Diversity Agreement, Generalized Energy Distance (GED), and Collective Insight
(CI) Score offering a comprehensive measure of both accuracy and uncertainty. Our results
highlight the value of text-guided diffusion for ambiguity-aware segmentation and establish a
new direction for controllable and interpretable medical image analysis.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~changjian_shui1
Submission Number: 6269
Loading