Keywords: influence functions, inverse Hessian-vector products, random sketching
Abstract: Influence functions are a popular tool for attributing models' outputs to training data. The traditional approach relies on the calculation of inverse Hessian-vector products (iHVP), but the classical solver ``Linear time Stochastic Second-order Algorithm'' (LiSSA, Agarwal et al. (2017)) is often deemed impractical for large models due to expensive computation and hyperparameter tuning. We show that the three hyperparameters --- the scaling factor, the batch size, and the number of steps --- can be chosen depending on two specific spectral properties of the Hessian: its trace and largest eigenvalue. By evaluating them with random sketching (Swartworth and Woodruff, 2023), we find that the batch size has to be sufficiently large for the LiSSA to converge; however, for all of the models we consider, the requirement is mild. We confirm our findings empirically by comparing to the Proximal Bregman Retraining Functions (PBRF, Bae et al. (2022)).
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5176
Loading