Keywords: Shapley value, data valuation, group evaluation
Abstract: Data Shapley is an important tool for data valuation, which quantifies the contribution of individual data points to machine learning models. 
 In practice, group-level data valuation is desirable when data providers contribute data in batch.  However, we identify that existing group-level extensions of Data Shapley are vulnerable to \emph{shell company attacks}, where strategic group splitting can unfairly inflate valuations.  We propose Faithful Group Shapley Value (FGSV) that uniquely defends against such attacks.  Building on original mathematical insights, we develop a provably fast and accurate approximation algorithm for computing FGSV.  Empirical experiments demonstrate that our algorithm significantly outperforms state-of-the-art methods in computational efficiency and approximation accuracy, while ensuring faithful group-level valuation.
Supplementary Material:  zip
Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)
Submission Number: 18146
Loading