Keywords: influence functions, data attribution, training data attribution, unrolled differentiation, deep learning theory
TL;DR: This paper introduces distributional training data attribution, a data attribution framework that accounts for stochasticity in deep learning training, enabling a mathematical justification for why influence functions work in this setting.
Abstract: Randomness is an unavoidable part of training deep learning models, yet something that traditional training data attribution algorithms fail to rigorously account for. They ignore the fact that, due to stochasticity in the initialisation and batching, training on the same dataset can yield different models. In this paper, we address this shortcoming through introducing _distributional_ training data attribution (d-TDA), the goal of which is to predict how the distribution of model outputs (over training runs) depends upon the dataset. Intriguingly, we find that _influence functions_ (IFs), a popular data attribution tool, are 'secretly distributional': they emerge from our framework as the limit to unrolled differentiation, without requiring restrictive convexity assumptions. This provides a new perspective on the effectiveness of IFs in deep learning. We demonstrate the practical utility of d-TDA in experiments, including improving data pruning for vision transformers and identifying influential examples with diffusion models.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 17125
Loading