TL;DR: unifying adversarial attack and adversarial robustness regarding fairness and accuracy
Abstract: While numerous work has been proposed to address fairness in machine learning, existing methods do not guarantee fair predictions under imperceptible feature perturbation, and a seemingly fair model can suffer from large group-wise disparities under such perturbation. Moreover, while adversarial training has been shown to be reliable in improving a model's robustness to defend against adversarial feature perturbation that deteriorates accuracy, it has not been properly studied in the context of adversarial perturbation against fairness. To tackle these challenges, in this paper, we study the problem of adversarial attack and adversarial robustness w.r.t. two terms: fairness and accuracy. From the adversarial attack perspective, we propose a unified structure for adversarial attacks against fairness which brings together common notions in group fairness, and we theoretically prove the equivalence of adversarial attacks against different fairness notions. Further, we derive the connections between adversarial attacks against fairness and those against accuracy. From the adversarial robustness perspective, we theoretically align robustness to adversarial attacks against fairness and accuracy, where robustness w.r.t. one term enhances robustness w.r.t. the other term. Our study suggests a novel way to unify adversarial training w.r.t. fairness and accuracy, and experiments show our proposed method achieves better robustness w.r.t. both terms.
Lay Summary: Machine learning models are often trained to make fair predictions across different groups, but their fairness can quickly break down if even tiny, hard-to-detect changes are made to the data. A model that appears fair in everyday situations may actually treat groups quite differently when faced with subtle tweaks.
Our research asks: Can we build machine learning models that remain both accurate and fair, when the data is changed in small, hard-to-detect ways? While it’s often believed that you have to choose between fairness and accuracy, we show that when data is maliciously modified, techniques designed to keep models accurate can also help maintain fairness under such changes—with only minor adjustments.
We developed a practical method that helps models maintain fair and accurate predictions, even when someone tries to fool them with subtle changes in the data. Our experiments show that this approach results in models whose decisions remain consistent, fair, and trustworthy—even in challenging situations.
Primary Area: Social Aspects->Fairness
Keywords: fairness, adversarial robustness
Submission Number: 7258
Loading