Keywords: Training Data attribution; Influence function
TL;DR: Attribution signals are noisy. Our method learns to re-weight layers to amplify the true signal, boosting accuracy and enabling fine-grained (e.g., subject vs. style) attribution.
Abstract: We study gradient-based data attribution, aiming to identify which training examples most influence a given output. Existing methods for this task either treat network parameters uniformly or rely on implicit weighting derived from Hessian approximations, which do not fully model functional heterogeneity of network parameters.
To address this, we propose a method to explicitly learn parameter importance weights directly from data, without requiring annotated labels.
Our approach improves attribution accuracy across diverse tasks, including image classification, language modeling, and diffusion, and enables fine-grained attribution for concepts like subject and style.
Primary Area: interpretability and explainable AI
Submission Number: 2682
Loading