[Re] Classwise-Shapley values for data valuation

Published: 01 Jul 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Event Certifications: reproml.org/MLRC/2023/Journal_Track
Abstract: We evaluate CS-Shapley, a data valuation method introduced in Schoch et al. (2022) for classification problems. We repeat the experiments in the paper, including two additional methods, the Least Core (Yan & Procaccia, 2021) and Data Banzhaf (Wang & Jia, 2023), a comparison not found in the literature. We include more conservative error estimates and additional metrics, like rank stability, and a variance-corrected version of Weighted Accuracy Drop, originally introduced in Schoch et al. (2022). We conclude that while CS-Shapley helps in the scenarios it was originally tested in, in particular for the detection of corrupted labels, it is outperformed by the conceptually simpler Data Banzhaf in the task of detecting highly influential points.
Certifications: Reproducibility Certification
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Minor edits as suggested by Action Editor 5mv9
Code: https://github.com/aai-institute/re-classwise-shapley
Supplementary Material: zip
Assigned Action Editor: ~Matthew_J._Holland1
Submission Number: 2227
Loading