[Re] Classwise-Shapley values for data valuation

TMLR Paper2227 Authors

16 Feb 2024 (modified: 28 May 2024)Decision pending for TMLREveryoneRevisionsBibTeX
Abstract: We evaluate CS-Shapley, a data valuation method introduced in Schoch et al. (2022) for classification problems. We repeat the experiments in the paper, including two additional methods, the Least Core (Yan & Procaccia, 2021) and Data Banzhaf (Wang & Jia, 2023), a comparison not found in the literature. We include more conservative error estimates and additional metrics, like rank stability, and a variance-corrected version of Weighted Accuracy Drop, originally introduced in Schoch et al. (2022). We conclude that while CS-Shapley helps in the scenarios it was originally tested in, in particular for the detection of corrupted labels, it is outperformed by the conceptually simpler Data Banzhaf in the task of detecting highly influential points.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Upload latest changes to the code as supplemental material
Assigned Action Editor: ~Matthew_J._Holland1
Submission Number: 2227