A Unified Theory of Random Projection for Influence Functions

Published: 02 Mar 2026, Last Modified: 02 Mar 2026ICLR 2026 Workshop DATA-FMEveryoneRevisionsCC BY 4.0
Keywords: influence functions, random projection, effective dimension
TL;DR: We provide a unified theory that gives principled and actionable guidance for applying influence functions reliably at scale.
Abstract: Influence functions and related data attribution scores take the form of inverse-sensitive bilinear functionals $g^\top F^{-1} g'$, where $F \succeq 0$ is a curvature operator and $g, g'$ are training and test gradients. In modern overparameterized models, forming or inverting $F \in \mathbb{R}^{d \times d}$ is prohibitive, motivating scalable influence computation via *random projection* with a sketch $P \in \mathbb{R}^{m \times d}$. This practice is commonly justified via the Johnson–Lindenstrauss (JL) lemma, which ensures approximate preservation of Euclidean geometry for a fixed dataset. However, preserving pairwise distances does not address how sketching behaves under inversion. Furthermore, there is no existing theory that explains how sketching interacts with other widely used heuristics, such as ridge regularization (replacing $F^{-1}$ with $(F + \lambda I)^{-1}$) and structured curvature approximations. We develop a unified theory characterizing when projection provably preserves influence functions, with a focus on the required sketch size $m$. When $g, g' \in \mathrm{range}(F)$, we show that: (i) **Unregularized projection**: exact preservation holds if and only if $P$ is injective on $\mathrm{range}(F)$, which necessitates $m \ge \mathrm{rank}(F)$; (ii) **Regularized projection**: ridge regularization fundamentally alters the sketching barrier, with approximation guarantees governed by the *effective dimension* of $F$ at the regularization scale $\lambda$. This dependence is both sufficient and worst-case necessary, and can be substantially smaller than $\mathrm{rank}(F)$; (iii) **Factorized influence**: for Kronecker-factored curvatures $F = A \otimes E$, the guarantees continue to hold for decoupled sketches $P = P_A \otimes P_E$, even though such sketches exhibit structured row correlations that violate canonical i.i.d. assumptions. The analysis further reveals an explicit computational–statistical trade-off inherent to factorized sketches. Beyond this range-restricted setting, we analyze **out-of-range test gradients** and quantify a sketch-induced *leakage* term that arises when test gradients have components in $\ker(F)$. This yields guarantees for influence queries on general, unseen test points. Overall, this work develops a novel and rigorous theory that characterizes when projection provably preserves influence and provides principled, instance-adaptive guidance for choosing the sketch size $m$ in practice.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 123
Loading