HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation

Xinyu Zhou; Simin Fan; Martin Jaggi

HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation

Xinyu Zhou, Simin Fan, Martin Jaggi

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Data Attribution, Influnece Function

TL;DR: We introduce an accurate yet efficient approximation methods for influence function computation by incorporating generalized fisher information and the Schulz's iterative algorithm.

Abstract: Influence function provides a principled method to assess the contribution of individual training samples to a specific target, yet their high computation costs limits its applications on large-scale models or datasets. Existing methods proposed for influence function approximation have significantly reduce the computation overheads. However, they mostly suffer from a unsatisfied accuracy due to the lack of strong convergence guarantees. The family of hyperpower methods are well-known for their rigorous convergence guarantees on matrix inverse approximation, while the matrix multiplication operation can involve intractable memory and computation costs on large-scale models. We propose HyperINF, an efficient and accurate influence function approximation method which leverages the hyperpower method, specifically the Schulz's iterative algorithm. To deal with the computation-intensive matrix multiplication, we incorporate the generalized fisher information (GFIM) as a low-rank approximation of the hessian matrix, which reduces the memory and computation overheads to a constant costs independent of ranks on LoRA-tuned models. We first demonstrate the superior accuracy and stability of HyperINF compared to other baselines through a synthetic convergence simulation of matrix inversion. We further validate the efficacy of HyperINFthrough extensive real-world data attribution tasks, including mislabeled data detection and data selection for LLM and VLM fine-tuning. On LoRA-tuned models, HyperINF achieves superior downstream performance with minimal memory and computational overhead, while other baselines suffer from significant degradation. The codebase is available at \url{https://anonymous.4open.science/r/HyperINF-B702}.

Primary Area: interpretability and explainable AI

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9234

Loading