HyperINF: Unleashing the HyperPower of Schulz's Method for Data Influence Estimation

Xinyu Zhou; Simin Fan; Martin Jaggi

HyperINF: Unleashing the HyperPower of Schulz's Method for Data Influence Estimation

Xinyu Zhou, Simin Fan, Martin Jaggi

Published: 08 Jul 2025, Last Modified: 26 Aug 2025COLM 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: language model; data; influence score

TL;DR: We propose HyperINF, an efficient and accurate influence function approximation which leverages the hyperpower method, specifically Schulz's iterative algorithm.

Abstract: Influence functions provide a principled approach to assess individual training samples' contributions to specific targets. However, their high computational costs have limited applications in large-scale models and datasets. While existing approximation methods have reduced computational overhead, they often suffer from inaccurate estimation due to weak convergence guarantees. Hyperpower methods offer rigorous convergence guarantees for matrix inverse approximation, but their matrix multiplication operations typically involve intractable memory and computation costs for large-scale models. We propose HyperINF, an efficient and accurate influence function approximation leveraging the hyperpower HyperINF, specifically Schulz's iterative algorithm. To address computation-intensive matrix multiplication, we incorporate generalized Fisher information (GFIM) as a low-rank Hessian matrix approximation, reducing memory and computation overhead to constant costs. Through comprehensive convergence simulations on matrix inversion, we demonstrate HyperINF's superior accuracy and stability compared to baselines. We further validate its efficacy through extensive real-world data attribution tasks, including mislabeled data detection and data selection for LLM and VLM fine-tuning. On LoRA-tuned models, HyperINF achieves superior downstream performance with minimal memory and computational overhead, while other approaches suffer significant degradation. Our code is available at https://github.com/Blackzxy/HyperINF .

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 430

Loading