A Controlled Study of Fairness Interventions for Temporal Graph Transformers on ICU Mortality Prediction
Keywords: Temporal Graph Transformers, Fairness in Machine Learning, Clinical Risk Prediction, Electronic, Health Records (EHR), ICU Mortality Prediction, Graph Neural Networks, Transformer Models
TL;DR: On MIMIC-IV ICU mortality, Temporal Graph Transformers do not solely decrease demographic gaps, and single-attribute reweighting simply shifts gaps. Per-subgroup threshold equalization on Platt-scaled probabilities cuts TPR gaps (<0.03).
Abstract: Temporal Graph Transformers (TGTS) have been
proposed for prediction on electronic health
records (EHRs), but it is unclear whether their
graph architecture reduces demographic perfor-
mance gaps or whether standard fairness mitiga-
tion behaves differently on TGTS than on sequen-
tial baselines. We present a controlled study on
the MIMIC-IV ICU mortality task. We compare
three TGT edge configurations against classical,
sequential, and transformer baselines, and bench-
mark two in-training fairness interventions (sam-
ple reweighting, variance regularization) and three
post-hoc interventions (Platt scaling, isotonic re-
gression, per-subgroup threshold equalization).
We find that (i) TGT graph structure alone does
not eliminate subgroup AUROC gaps, but the
choice of edge type matters: TGTFULL achieves
the smallest race AUROC gap of any model under
matched training; (ii) single-attribute reweighting
reduces the targeted attribute’s gap but enlarges
the gap on at least one other attribute in every
model evaluated; and (iii) per-subgroup threshold
equalization on top of Platt-scaled probabilities
reduces the TPR gap from 0.20–0.23 → <0.03
on both LSTM and TGTFULL, while calibration
alone leaves AUROC gaps largely unchanged or
worse.
Submission Number: 159
Loading