[Re] On the reproducibility of "CrossWalk: Fairness-Enhanced Node Representation Learning"

Eric Zila; Jonathan Gerbscheid; Luc Sträter; Kieron Kretschmar

[Re] On the reproducibility of "CrossWalk: Fairness-Enhanced Node Representation Learning"

Eric Zila, Jonathan Gerbscheid, Luc Sträter, Kieron Kretschmar

Published: 02 Aug 2023, Last Modified: 02 Aug 2023MLRC 2022Readers: Everyone

Keywords: rescience c, deepwalk, crosswalk, graph, node-embeddings, fairwalk, fairness, pytorch

TL;DR: Reproducibility study of "CrossWalk: Fairness-Enhanced Node Representation Learning".

Abstract: Scope of Reproducibility - The original authors present CrossWalk, an edge-reweighting algorithm which can be used in conjunction with random walk based node representation learning methods. We validate their claims of CrossWalk being characterized by a fairness-enhancing property, meaning it significantly reduces disparity, a measure of group fairness, and performance-conserving property, meaning it has an insignificant effect on task performance. Methodology - To perform a robust validation of the original authors' claims, we develop an independent, highly-modular code-base with complete re-implementation of the original experiments. Our design enables its use by other researchers in the future to easily run ablation experiments with different datasets, experiments, or even algorithms that can be employed in conjunction with CrossWalk. Furthermore, we create an accessible implementation of CrossWalk itself under the MIT license. Results - Our results provide solid evidence in favor of the performance-conserving property of CrossWalk. However, we find inconclusive evidence of the fairness-enhancing property of CrossWalk, mostly due to large variation in the reproduced disparity values. On the other hand, we find additional evidence in its favor by performing an experiment portraying the influence of the hyperparameters of CrossWalk. What was easy - The original authors provide a code-base implementing their methodology, which greatly helped us in understanding the material. Furthermore, their methodology is very modular, meaning we could test most parts of the pipeline independently. What was difficult - The original work contains discrepancies between specification of CrossWalk in the formulas, the pseudo-code, and the code-base. Also, we were unable to reproduce results for one of the datasets because of missing data. Finally, the original implementation is inadequately documented and its execution required numerous manual steps which were non-trivial and time consuming. Communication with original authors - To clarify some details regarding the original implementation and its structure, we reached out to the authors when beginning to reproduce their work. The authors were quick to respond and answered all of our questions.

Paper Url: https://ojs.aaai.org/index.php/AAAI/article/view/21454

Paper Venue: AAAI 2022

Confirmation: The report pdf is generated from the provided camera ready Google Colab script, The report metadata is verified from the camera ready Google Colab script, The report contains correct author information., The report contains link to code and SWH metadata., The report follows the ReScience latex style guides as in the Reproducibility Report Template (https://paperswithcode.com/rc2022/registration)., The report contains the Reproducibility Summary in the first page., The latex .zip file is verified from the camera ready Google Colab script

Latex: zip

Journal: ReScience Volume 9 Issue 2 Article 27

Doi: https://www.doi.org/10.5281/zenodo.8173717

Code: https://archive.softwareheritage.org/swh:1:dir:0010ab17932c3abd9cb892f1b92da408df43689c

0 Replies

Loading