Toward learnable and interpretable data Shapley valuation for deep learning

Mengyang Li, Weiyao Zhu, Ou Wu

Published: 01 Jan 2025, Last Modified: 15 Sept 2025Knowl. Based Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Measuring the value of individual samples is essential for a wide range of data-driven tasks, particularly in deep learning. The Shapley value, rooted in game theory, serves as the primary metric for data valuation, and numerous methods have been proposed for its computation. While the Shapley value boasts a robust theoretical foundation, its estimation traditionally relies on game experiment-based procedures rather than learnable approaches, leaving the construction of an explicit valuation model unaddressed. Furthermore, existing data Shapley valuation methods lack interpretability, as they fail to explain the factors contributing to a specific sample’s value — whether high or low — or the mechanisms by which these factors exert influence. This study seeks to develop a learnable and interpretable data Shapley valuation model tailored to deep learning tasks. To this end, we propose a novel learning framework that maps sample characteristics directly to their Shapley values. Central to this framework is the design of an innovative neural regression tree, which surpasses existing neural regression trees in both interpretability and computational efficiency. Leveraging this structure, we introduce a new data Shapley valuation method that employs the neural regression tree as its core component. The resulting learnable valuation model offers significant advantages, such as a fixed number of parameters and the ability to reuse knowledge across tasks, while the interpretability of the model enables explanations for why certain samples are assigned specific values. Comprehensive experiments on benchmark datasets validate the effectiveness of our approach, demonstrating its initial success in producing learnable and interpretable Shapley values.