Efficient Data Valuation for Weighted Nearest Neighbor Algorithms

NeurIPS 2023 Workshop ATTRIB Submission34 Authors

Published: 27 Oct 2023, Last Modified: 08 Dec 2023ATTRIB PosterEveryoneRevisionsBibTeX
Keywords: Data Valuation, Shapley value
TL;DR: We develop efficient Data Shapley computation algorithm for weighted K nearest neighbor classifiers.
Abstract: Data Shapley is a principled way to assess the importance of individual training data sources for machine learning (ML) applications. However, it often comes with computational challenges in calculating exact Data Shapley scores. KNN-Shapley \citep{jia2019efficient}, which assigns data value leveraging the efficiently computable Data Shapley score of $K$ nearest neighbors (KNN), has gained popularity as a viable alternative due to its computationally efficient nature. However, \cite{jia2019efficient} only gives a practical algorithm for computing Data Shapley for unweighted KNN, but weighted KNN is more prevalently used in practice. This work addresses the computational challenges of calculating the exact Data Shapley for weighted KNN classifiers (WKNN-Shapley). By making small adjustments to KNN configurations, we recast the computation of WKNN-Shapley into a counting problem and introduce an $O(K^2 N^2)$ algorithm, presenting a notable improvement from the naive, impractical $O(N^K)$ algorithm. We also develop a deterministic approximation algorithm that further improves computational efficiency while maintaining the key fairness properties of the Shapley value. These advancements position WKNN-Shapley as a compelling alternative to KNN-Shapley. In particular, WKNN-Shapley can select high-quality data points and improve the performance of retrieval-augmented language models.
Submission Number: 34