Multi-Modal Medical Image Augmentation for Controlled Heterogeneity and Fair Outcomes

Soo Yong Kim; Seunghyeok Hong

Multi-Modal Medical Image Augmentation for Controlled Heterogeneity and Fair Outcomes

Soo Yong Kim, Seunghyeok Hong

Published: 01 Jul 2025, Last Modified: 09 Jul 2025ICML 2025 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Medical Image Augmentation, Sketch-Conditioned Diffusion, Fairness and Class Imbalance, Diversity Metric

Abstract: Limited data in medical imaging exacerbate class imbalance and fairness gaps, undermining deep-learning across diverse patient subgroups. GAN- and diffusion-based augmenters can expand datasets but often lack precise control over multiple clinical attributes and fail to cover the full range of real-world variability. We introduce a four-step augmentation pipeline. First, an automated scoring function identifies which classes or regions most urgently need synthetic examples. Second, we construct sketch–image–text triplets from real scans, embedding age, sex, and disease labels. Third, we fine-tune a sketch-conditioned diffusion network for reliable sketch-to-image synthesis and boost variability by generating multiple, similarity-penalized sketches per case. Fourth, we propose a novel diversity metric that simultaneously measures semantic feature-space coverage and pixel-level dispersion—unlike FID or IS, it captures intra-class spread and boundary sharpness without human annotations. Experiments on chest X-rays show our pipeline delivers high-fidelity, diverse images aligned with user-specified conditions, substantially improving fairness and generalizability.

Submission Number: 55

Loading