When Do Causal Fairness Constraints Work? Reproducing and Stress-Testing Long-Term Fair Reinforcement Learning

When Do Causal Fairness Constraints Work? Reproducing and Stress-Testing Long-Term Fair Reinforcement Learning

TMLR Paper9511 Authors

05 Jun 2026 (modified: 19 Jun 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We study the reproducibility of A Causal Lens for Learning Long-Term Fair Policies by Lear & Zhang (2025), which introduces qualification gain disparity (QGD) as a long-term fairness objective in sequential decision-making and proposes causality-aware PPO variants (PPO-C and PPO-Cb) to reduce it. Building on the authors' official implementation, we replicate their core experiments in a bank-lending task and test whether the reported disparity reductions, causal decomposition trends, and utility–fairness trade-offs hold. Our results largely confirm the original findings: PPO-C and PPO-Cb consistently reduce QGD relative to standard PPO and fairness-aware baselines, with the causal decomposition suggesting that these reductions mainly come from making the learned policy's direct treatment of groups more similar, rather than from changes in the environment's transition dynamics. However, we find that utility preservation is weaker than originally reported in some settings. We further extend the evaluation along three axes: strongly imbalanced population ratios, a K-group extension (where K > 2) based on Qualification Gain Variance (QGV), and a structurally different infectious-disease environment. These extensions show that the K-group objective is highly sensitive to the fairness coefficient: untuned penalties can collapse utility, while moderate values recover useful trade-offs. We also show that group-level causal decomposition remains diagnostically useful, with reductions in QGV arising mainly through the direct policy component while structural sources of disparity are offset by indirect dynamics rather than eliminated. Overall, we support most of the original claims while clarifying when causal long-term fairness objectives remain effective and stable.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Feng_Zhou9

Submission Number: 9511

Loading