Keywords: conformal prediction, missingness, distribution shift, clinical machine learning
TL;DR: We show that cross-hospital missingness shift causes subgroup coverage failures in standard conformal prediction, and that a simple label-free missingness-aware Mondrian calibration substantially reduces these gaps with little change in set size.
Abstract: Split conformal prediction (CP) guarantees marginal coverage under exchangeability, but cross-hospital deployment breaks this assumption because missingness patterns differ across hospitals. We show that pooled calibration can hide clinically important subgroup coverage gaps: on GOSSIS, standard CP attains 90\% marginal coverage yet covers a low-missingness subgroup at only 0.775. We decompose the subgroup gap into a calibration heterogeneity term $\eta_k$ and a within-group shift term $\delta_k$, showing that the Mondrian bound removes $\eta_k$ and is tighter when $\eta_k$ exceeds the finite-sample grouping cost. We then introduce a label-free selection rule that chooses the missingness variable with the largest cross-site missingness shift and calibrates within its two subgroups. On GOSSIS, our method halves the maximum subgroup gap from 0.028--0.044 to 0.015--0.021 with less than 1\% change in set size; on MIMIC-IV, it reduces the gap from 0.049--0.075 to 0.026--0.046. Subgroup assignment is invariant to model updates and remains stable under deployment-time stress tests where optimization-based baselines degrade substantially.
Submission Number: 6
Loading