Reproducing Improvemement-Focused Causal Recourse

TMLR Paper2238 Authors

16 Feb 2024 (modified: 13 May 2024)Rejected by TMLREveryoneRevisionsBibTeX
Abstract: Reproducibility Summary Scope of Reproducibility - In this work, we evaluate the reproducibility of the paper Improvement- Focused Causal Recourse (ICR) by König et al. (2023). Our goal is to reproduce the paper’s four main claims: (1) Do CE, CR and ICR lead to improvement? (2) Do CE, CR and ICR lead to acceptance (by pre- and post- post-recourse predictor) ? (3) Do CE, CR and ICR lead to acceptance by other predictors with comparable test error? (4) How costly are CE, CR and ICR recommendations ? Methodology — The authors of the paper provide an implementation in PyTorch for their proposed techniques and experiments. We reuse and extend their code for our additional experiments. The com- putational cost for running the experiments mentioned in the paper is 110 GPU hours using an NVIDIA A100-SXM4-40GB MIG 3g.20gb accelerator. Additionally, we took 317 GPU hours to reproduce the results for our extended experiments. Results — We reproduced the original paper’s work through our experiments. We find that the main claims of the paper largely hold. We assess the robustness and generalizability of some of the claims, through our additional experiments. In that case, we found that one claim is not reproducible for our own synthesized 4-var SCM and also found a bug in the code for the 5-var SCM. Experiments are conducted with and without this bug. What was easy — The commands to run the different Structural Causal Models with different confidence and hyper-parameter settings is well documented. All the relevant plots used in the paper are generated by a single command. Also, the names for causal variables used for the SCMs in the paper resemble with those used in the source code, which made the code interpretable. What was difficult — We could not run the experiment for 10 iterations as mentioned in the paper due to time and resource constraints. Additionally, the authors used different random seeds for each experiment which was not documented anywhere. We faced minor integer typecasting errors in the code which were fixed from our end. Communication with original authors — We reached out to the authors once about our queries regarding the assumptions and contexts of some sub-claims in the paper. We received a prompt response which satisfied most of our questions.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Andrew_Miller1
Submission Number: 2238
Loading