Clinically-Guided Counterfactuals (C³): Physics and Pathology-Aware Augmentation and Evaluation for Robust Medical Imaging Models

23 Sept 2025 (modified: 16 Oct 2025)EurIPS 2025 Workshop MedEurIPS SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: medical imaging, robustness, distribution shift, calibration, counterfactual augmentation, CT, MRI, fundus
TL;DR: Label-preserving, clinically grounded counterfactuals (physics + pathology aware) used for both training and evaluation improve OOD accuracy, calibration, and worst-case per-patient reliability across X-ray, MRI, and fundus tasks.
Abstract: Clinical deployment of imaging AI remains fragile: routine distribution shifts—scanner vendor and reconstruction kernel, MRI protocol updates, dose and slice profile changes, patient positioning and demographics, and device optics—can degrade performance in ways that standard leaderboards and generic augmentations fail to predict. We ask whether robustness and calibration can be improved, without compromising clinical validity, by training and evaluating models against \emph{label-preserving, clinically grounded counterfactuals}. We introduce \textit{Clinically-Guided Counterfactuals (C$^3$)}, a framework that (i) unifies physics-informed acquisition perturbations with tightly constrained, pathology-preserving semantic edits; (ii) screens all counterfactuals through a conservative validity gate; and (iii) reports \emph{shift-stable utility}, a worst-case case-level score complementary to AUROC, Dice, ECE, and Brier. Across chest X-ray (CheXpert$\to$MIMIC-CXR), MS brain MRI segmentation (multi-site$\to$held-out site), and diabetic retinopathy grading (EyePACS$\to$Messidor-2), C$^3$ delivers consistent OOD gains (e.g., macro-AUROC $+0.035$ on CXR; lesion-wise Dice $+0.044$ on MRI; DR AUROC $+0.036$), tighter calibration, reduced prediction volatility under realistic shifts, and interpretable robustness diagnostics suitable for deployment checks.
Submission Number: 1
Loading