Keywords: Fair clustering, unsupervised learning, Variational auto-encoders, Adversarial neural networks, deep learning, ML reproducibility
Abstract: Scope of reproducibility
The authors propose a novel method for Deep Fair Clustering (DFC), combining existing frameworks for fair clustering—which typically have difficulty with high-dimensional large-scale data—with previous work on deep cluster-ing—which typically has difficulty with fairness. Our reproducibility work targets the central claim that DFC learns fair representations with minimal utility loss and obtains superior results on both fairness and accuracy.
Methodology
We used the code repository made available by the authors and extended it to include support for pretraining, different datasets and comparative methods. We compare the DFC method against Deep Embedded Clustering (DEC), —which implements a comparable deep clustering method without fairness constraints—on the same four datasets (obtained from MNIST, USPS, MTFL and Office-31) and fairness metrics that were used in the paper. We select one dataset for additional experiments aimed at validating the contribution of individual components of the DFC towards fairness. All experiments were run on a GeForce 1080Ti GPU. Hyperparameter optimization was performed using the Weights & Biases Sweeps feature.
Results
On the selected dataset, we reproduced accuracy to within 2% of reported value, normalized mutual information (NMI) and Entropy to within 1%, and balance to within 5%. Our DFC method outperformed our DEC method on all accuracy and fairness metrics. We reproduced the accuracy of the non-digit datasets to within 1% (Office-31) and 7% (MTFL) but failed to obtain similar results for balance.
What was easy
We found no major challenges reproducing the provided code in as far as we used their provided pretrained models and the selected dataset used in the code.
What was difficult
Extending the code for the non-digit datasets was a challenge, as some hyperparameter settings and architecture details were difficult to infer from the paper. We found that performance was sensitive to small changes in implementation and training of the encoders, causing us to run into time and resource constraints when trying to reproduce all results for these datasets due to the large number of models that require pretraining.
Communication with original authors
We had helpful one-off contact with the authors to verify hyperparameter settings.
Paper Url: https://openreview.net/forum?id=MhMYW2PqGSH&referrer=%5BML%20Reproducibility%20Challenge%202020%5D(%2Fgroup%3Fid%3DML_Reproducibility_Challenge%2F2020)
Supplementary Material: zip
4 Replies
Loading