- Keywords: Invariant Feature Learning, Vanished Correlation, Generative Adversarial Networks, Gender Shades, Fairness in Machine Learning
- TL;DR: We propose a method based on the adversarial training strategy to learn discriminative features unbiased and invariant to the confounder(s) by incorporating a loss function that encourages a vanished correlation between the bias and learned features.
- Abstract: Presence of bias and confounding effects is inarguably one of the most critical challenges in machine learning applications that has alluded to pivotal debates in the recent years. Such challenges range from spurious associations of confounding variables in medical studies to the bias of race in gender or face recognition systems. One solution is to enhance datasets and organize them such that they do not reflect biases, which is a cumbersome and intensive task. The alternative is to make use of available data and build models considering these biases. Traditional statistical methods apply straightforward techniques such as residualization or stratification to precomputed features to account for confounding variables. However, these techniques are not in general applicable to end-to-end deep learning methods. In this paper, we propose a method based on the adversarial training strategy to learn discriminative features unbiased and invariant to the confounder(s). This is enabled by incorporating a new adversarial loss function that encourages a vanished correlation between the bias and learned features. We apply our method to a synthetic, a medical diagnosis, and a gender classification (Gender Shades) dataset. Our results show that the learned features by our method not only result in superior prediction performance but also are uncorrelated with the bias or confounder variables. The code is available at http://blinded_for_review/.