Reimplementing the Adversarially Reweighted Learning model by Lahoti et al. (2020) to improve fairness without demographics

Luc Weytingh; Jardenna Mohazzab; Casper Wortmann; Barbara Brocades Zaalberg

Reimplementing the Adversarially Reweighted Learning model by Lahoti et al. (2020) to improve fairness without demographics

Luc Weytingh, Luc Weytingh, Jardenna Mohazzab, Casper Wortmann, Barbara Brocades Zaalberg

31 Jan 2021 (modified: 05 May 2023)ML Reproducibility Challenge 2020 Blind SubmissionReaders: Everyone

Keywords: Fairness, ARL, Demographics

Abstract: Scope of Reproducibility It is often the case in Machine Learning systems that the used data does not contain protected group membership due to privacy rules and regulations. This makes it difficult to improve fairness for disadvantaged subgroups. As a solution, Lahoti et al. propose Adversarially Reweighted Learning (ARL) \citep{lahoti2020fairness}. They claim that ARL significantly improves fairness for computationally identifiable subgroups. Methodology In this project we aimed to reproduce, replicate, and evaluate the results presented by Lahoti et al. First, the open-source TensorFlow implementation of the ARL model was used to test the reproducibility of the results. Second, the ARL model was re-implemented in PyTorch to test the replicability of the results. Finally, the significance of the ARL model was tested against a baseline model using P-value tests. We trained and evaluated the models in about half a minute per model iteration (i.e. for fully training and evaluating the model) on a 2,3 GHz 8-Core Intel Core i9 processor. % A GPU of the Lisa Compute Cluster was used for an exhaustive 48 hour hyperparameter grid-search. Results Our findings suggest that (1) the paper is not reproducibile, (2) the paper is replicable, yet (3) the results are not significant. The main results were reproduced within 2$\%$ of the reported values. However, with limited knowledge of the original hyperparameters used and the inability to produce several additional metrics presented in the paper we concluded the paper to not be reproducible. The PyTorch implementation produced results within 1$\%$ of the reported values, suggesting that the paper is replicable. However, the results proved to be insignificant when compared to a baseline model. What was easy The paper by Lahoti et al. was concise and clearly structured. This, in combination with the well documented open-source TensorFlow implementation, provided us with clear guidance when re-implementing the ARL model in PyTorch. What was difficult Pre-processing the data proved difficult. In addition, some details regarding the model were not mentioned in the paper. Therefore, we had to make some impactful assumptions about e.g. the amount of training steps, and the original hyperparameters used. Communication with original authors The authors were contacted by email about some missing details in their paper. However, we did not receive a response.

Paper Url: https://openreview.net/forum?id=SiHVX35sDT

4 Replies

Loading