Keywords: Fairness, Neural Networks, Formal Verification, Formal Guarantees
TL;DR: We propose PBA fairness, a formal measure of persistent local disparities between inputs and counterfactuals with flipped sensitive attributes under identical perturbed features.
Abstract: Fairness is a key concern in Machine Learning, requiring careful consideration of how models treat individuals from different demographic groups. In this paper, we propose Ceteris Paribus Persistent-Bias-Aware (PBA) fairness and an approach to formally quantify it. PBA fairness captures the relative neural network’s confidence between an input and its counterfactual (where the sensitive attribute of the original input is flipped), while numerical features are jointly perturbed within a local neighborhood, but kept identical across both instances. As such, PBA fairness allows us to isolate the effect of the sensitive attribute, enabling formal identification of disparities that are consistently present in the model behavior within an entire neighborhood. We evaluate our proposed approach under both fairness-agnostic and fairness-aware training methods and compare it to several well-established fairness metrics on three benchmark datasets: Adult, COMPAS, and German Credit. The results demonstrate that our proposed approach identifies formally-proved disparities present in the model behavior, but overlooked by other approaches, and offers additional insights into model behavior.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 18726
Loading