Keywords: data valuation, least core, shapley values
TL;DR: A reproduction study of ‘If You Like Shapley Then You’ll Love the Core’, by Yan, Tom, and Procaccia 2021.
Abstract: We investigate the results of [1] in the field of data valuation. We repeat their experiments and conclude that the (Monte Carlo) Least Core is sensitive to important characteristics of the ML problem of interest, making it difficult to apply.
Scope of Reproducibility — We test all experimental claims about Monte Carlo approximations to the Least Core and their application to standard data valuation tasks.
Methodology — We use an open source library pyDVL for all valuation algorithms. We document all details on dataset choice and generation in this paper, and release all code as open source.
Results — We were able to reproduce the results on Least Core approximation. For the task of low‐value point identification we observed an inverted performance gap between least core and Shapley values. For high‐value identification, the least core slightly outperformed Shapley values. In two experiments, we must depart from the original paper and arrive at different conclusions.
What was easy — Open source libraries like DVC and ray enabled efficiently designing and running the experiments.
What was difficult — Data generation was difficult for dog‐vs‐fish because no code was available. Computing the Monte Carlo Least Core was very sensitive to the choice of utility function. Reproducing some experiments was difficult due to lack of details.
Communication with original authors — We asked the authors for details on the experimental setup and they kindly and promptly sent us the code used for the paper. This was very useful in understanding all steps taken and in uncovering some weaknesses in the experiments.
Paper Url: https://ojs.aaai.org/index.php/AAAI/article/view/16721
Paper Venue: Other venue (not in list)
Venue Name: 35th AAAI Conference on Artificial Intelligence, 2021
Supplementary Material: zip
Confirmation: The report follows the ReScience latex style guides as in the Reproducibility Report Template (https://paperswithcode.com/rc2022/registration)., The report contains the Reproducibility Summary in the first page.
Latex: zip
Journal: ReScience Volume 9 Issue 2 Article 32
Doi: https://www.doi.org/10.5281/zenodo.8173733
Code: https://archive.softwareheritage.org/swh:1:dir:294da04ace110a1e2944203314f968a0bbf3c0a1
0 Replies
Loading