Keywords: data attribution, data valuation, influence function
TL;DR: We present a method that nearly perfectly predicts the counterfactual effect of changing the training dataset on deep learning model.
Abstract: The goal of data attribution is to estimate how adding or removing
a given set of training datapoints will affect model predictions.
This goal is straightforward in convex settings,
but in non-convex settings, existing methods are far less
successful. Current methods' estimates often only weakly
correlate with the ground truth.
In this work, we present a new data attribution
method (MAGIC) that combines classical methods and recent advances in
metadifferentiation (Engstrom et al., 2025) to nearly optimally
estimate the effect of adding or removing training data on model predictions
at the cost of only 2-3x the cost of training a single model.
MAGIC essentially "solves" data attribution as it is currently
studied, thus enabling downstream applications
and motivating more fine-grained future evaluations.
Primary Area: interpretability and explainable AI
Submission Number: 16561
Loading