A Controlled Study of Fairness Interventions for Temporal Graph Transformers on ICU Mortality Prediction

Jet Locati; Thomas Bezza; Dhruv Bhandari; Vanita Venkatesh; Daaniyaal Uddin; Kevin Zhu

A Controlled Study of Fairness Interventions for Temporal Graph Transformers on ICU Mortality Prediction

Jet Locati, Thomas Bezza, Dhruv Bhandari, Vanita Venkatesh, Daaniyaal Uddin, Kevin Zhu

Published: 23 May 2026, Last Modified: 23 May 2026SD4H ICML 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Temporal Graph Transformers, Fairness in Machine Learning, Clinical Risk Prediction, Electronic, Health Records (EHR), ICU Mortality Prediction, Graph Neural Networks, Transformer Models

TL;DR: On MIMIC-IV ICU mortality, Temporal Graph Transformers do not solely decrease demographic gaps, and single-attribute reweighting simply shifts gaps. Per-subgroup threshold equalization on Platt-scaled probabilities cuts TPR gaps (<0.03).

Abstract: Temporal Graph Transformers (TGTS) have been proposed for prediction on electronic health records (EHRs), but it is unclear whether their graph architecture reduces demographic perfor- mance gaps or whether standard fairness mitiga- tion behaves differently on TGTS than on sequen- tial baselines. We present a controlled study on the MIMIC-IV ICU mortality task. We compare three TGT edge configurations against classical, sequential, and transformer baselines, and bench- mark two in-training fairness interventions (sam- ple reweighting, variance regularization) and three post-hoc interventions (Platt scaling, isotonic re- gression, per-subgroup threshold equalization). We find that (i) TGT graph structure alone does not eliminate subgroup AUROC gaps, but the choice of edge type matters: TGTFULL achieves the smallest race AUROC gap of any model under matched training; (ii) single-attribute reweighting reduces the targeted attribute’s gap but enlarges the gap on at least one other attribute in every model evaluated; and (iii) per-subgroup threshold equalization on top of Platt-scaled probabilities reduces the TPR gap from 0.20–0.23 → <0.03 on both LSTM and TGTFULL, while calibration alone leaves AUROC gaps largely unchanged or worse.

Submission Number: 159

Loading