Keywords: Robustness, Fairness, Adversarial Robustness, Adversarial Training, Reproduction
TL;DR: This report replicates the results of the paper "To be Robust or to be Fair: Towards Fairness in Adversarial Training," first identifying unfairness resulting from adversarial training and then implementing two algorithms for correcting this bias.
Abstract: Scope of Reproducibility This work attempts to reproduce the results of the 2021 ICML paper "To be Robust or to be Fair: Towards Fairness in Adversarial Training." I first reproduce classwise accuracy and robustness discrepancies resulting from adversarial training, and then implement the authors' proposed Fair Robust Learning (FRL) algorithms for correcting this bias. Methodology In the spirit of education and public accessibility, this work attempts to replicate the results of the paper from first principles using Google Colab resources. To account for the limitations imposed by Colab, a much smaller model and dataset are used. All results can be replicated in approximately 10 GPU hours, within the usual timeout window of an active Colab session. Serialization is also built into the example notebooks in the case of crashes to prevent too much loss, and serialized models are also included in the repository to allow others to explore the results without having to run hours of code. Results This work finds that (1) adversarial training does in fact lead to classwise performance discrepancies not only in standard error (accuracy) but also in attack robustness, (2) these discrepancies exacerbate existing biases in the model, (3) upweighting the standard and robust errors of poorly performing classes during training decreased this discrepancy for both both the standard error and robustness and (4) increasing the attack margin for poorly performing classes during training also decreased these discrepancies, at the cost of some performance. (1) (2) and (3) match the conclusions of the original paper, while (4) deviated in that it was unsuccessful in helping increasing the robustness the most poorly performing classes. Because the model and datasets used were totally different from the original paper's, it is hard to to quantify the exact similarity of our results. Conceptually however, I find very similar conclusions. What was easy It was easy to identify the unfairness resulting from existing adversarial training methods and implement the authors' FRL (reweight) and FRL (remargin) approaches for combating this bias. The algorithm and training approaches are well outlined in the original paper, and are relatively accessible even for those with little experience in adversarial training. What was difficult Because of the resource limitations imposed, I was unable to successfully implement the suggested training process using the authors' specific model and dataset. Also, even with a smaller model and dataset it was difficult to thoroughly tune the hyperparameters of the model and algorithm. Communication with original authors I did not have contact with the authors during the process of this reproduction. I reached out for feedback once I had a draft of the report, but did not hear back.
Paper Url: https://arxiv.org/pdf/2010.06121.pdf
Paper Venue: ICML 2021