Uncovering Locally-Persistent Bias via Ceteris Paribus Fairness

Uncovering Locally-Persistent Bias via Ceteris Paribus Fairness

ICLR 2026 Conference Submission18726 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Fairness, Neural Networks, Formal Verification, Formal Guarantees

TL;DR: We propose PBA fairness, a formal measure of persistent local disparities between inputs and counterfactuals with flipped sensitive attributes under identical perturbed features.

Abstract: Fairness is a key concern in Machine Learning, requiring careful consideration of how models treat individuals from different demographic groups. In this paper, we propose Ceteris Paribus Persistent-Bias-Aware (PBA) fairness and an approach to formally quantify it. PBA fairness captures the relative neural network’s confidence between an input and its counterfactual (where the sensitive attribute of the original input is flipped), while numerical features are jointly perturbed within a local neighborhood, but kept identical across both instances. As such, PBA fairness allows us to isolate the effect of the sensitive attribute, enabling formal identification of disparities that are consistently present in the model behavior within an entire neighborhood. We evaluate our proposed approach under both fairness-agnostic and fairness-aware training methods and compare it to several well-established fairness metrics on three benchmark datasets: Adult, COMPAS, and German Credit. The results demonstrate that our proposed approach identifies formally-proved disparities present in the model behavior, but overlooked by other approaches, and offers additional insights into model behavior.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 18726

Loading