Controllable Preference Alignment for Ambiguous Medical Image Segmentation via Text and Dice Guidance
Keywords: Medical image segmentation, Ambiguity, Diffusion models, Clinical metadata, Direct Preference Optimization, DDIM Sampling, Multi-Modal framework
Abstract: In medical imaging, different experts often provide different but equally valid segmentations, making ambiguity an inherent challenge. A good model should therefore capture this variability by producing a distribution of plausible masks, rather than a single deterministic output. Diffusion models are well-suited for this task because of their ability to generate diverse samples, but standard training does not guarantee clinically meaningful segmentations. Prior work in ambiguous segmentation, such as diffusion-based approaches, lacks semantic control. This work introduces a multi-modal framework that makes diffusion-based segmentation both controllable and clinically aligned. The model is conditioned on input images and descriptive text from clinical metadata, and Direct Preference Optimization (DPO) is adapted by using Dice-based signals from multi-rater annotations instead of subjective human feedback. Three preference strategies are explored, with a consensus-based Mean Dice signal proving most effective. With DDIM sampling, inference is accelerated by a factor of three, making the approach practical for clinical use. Experiments on LIDC-IDRI demonstrate state-of-the-art segmentation quality while preserving diversity, and introduce a controllable preference knob that enables practitioners to directly adjust the balance between per-sample accuracy and distributional variability.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 9396
Loading