On the Alignment between Fairness and Accuracy: from the Perspective of Adversarial Robustness

Junyi Chai; Taeuk Jang; Jing Gao; Xiaoqian Wang

On the Alignment between Fairness and Accuracy: from the Perspective of Adversarial Robustness

Junyi Chai, Taeuk Jang, Jing Gao, Xiaoqian Wang

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: unifying adversarial attack and adversarial robustness regarding fairness and accuracy

Abstract: While numerous work has been proposed to address fairness in machine learning, existing methods do not guarantee fair predictions under imperceptible feature perturbation, and a seemingly fair model can suffer from large group-wise disparities under such perturbation. Moreover, while adversarial training has been shown to be reliable in improving a model's robustness to defend against adversarial feature perturbation that deteriorates accuracy, it has not been properly studied in the context of adversarial perturbation against fairness. To tackle these challenges, in this paper, we study the problem of adversarial attack and adversarial robustness w.r.t. two terms: fairness and accuracy. From the adversarial attack perspective, we propose a unified structure for adversarial attacks against fairness which brings together common notions in group fairness, and we theoretically prove the equivalence of adversarial attacks against different fairness notions. Further, we derive the connections between adversarial attacks against fairness and those against accuracy. From the adversarial robustness perspective, we theoretically align robustness to adversarial attacks against fairness and accuracy, where robustness w.r.t. one term enhances robustness w.r.t. the other term. Our study suggests a novel way to unify adversarial training w.r.t. fairness and accuracy, and experiments show our proposed method achieves better robustness w.r.t. both terms.

Lay Summary: Machine learning models are often trained to make fair predictions across different groups, but their fairness can quickly break down if even tiny, hard-to-detect changes are made to the data. A model that appears fair in everyday situations may actually treat groups quite differently when faced with subtle tweaks. Our research asks: Can we build machine learning models that remain both accurate and fair, when the data is changed in small, hard-to-detect ways? While it’s often believed that you have to choose between fairness and accuracy, we show that when data is maliciously modified, techniques designed to keep models accurate can also help maintain fairness under such changes—with only minor adjustments. We developed a practical method that helps models maintain fair and accurate predictions, even when someone tries to fool them with subtle changes in the data. Our experiments show that this approach results in models whose decisions remain consistent, fair, and trustworthy—even in challenging situations.

Primary Area: Social Aspects->Fairness

Keywords: fairness, adversarial robustness

Submission Number: 7258

Loading