TL;DR: We propose a principled Robust Adversarial Attribute Neighbourhood (RAAN) loss to debias the classification head and to promote a fairer representation distribution across different sensitive attribute groups with a theoretical guarantee
Abstract: Improving fairness between privileged and less-privileged sensitive attribute groups (e.g, {race, gender}) has attracted lots of attention.
To enhance the model performs uniformly well in different sensitive attributes, we propose a principled \underline{R}obust \underline{A}dversarial \underline{A}ttribute \underline{N}eighbourhood (RAAN) loss to debias the classification head and to promote a fairer representation distribution across different sensitive attribute groups. The key idea of RAAN is to mitigate the differences of biased representations between different sensitive attribute groups by assigning each sample an adversarial robust weight, which is defined on the representations of adversarial attribute neighbors, i.e, the samples from different protected groups. To provide efficient optimization algorithms, we cast the RAAN into a sum of coupled compositional functions and propose a stochastic adaptive (Adam-style) and non-adaptive (SGD-style) algorithm framework SCRAAN with provable theoretical guarantee. Extensive empirical studies on fairness-related benchmark datasets verify the effectiveness of the proposed method.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
Supplementary Material: zip
6 Replies
Loading