Generalized Fisher-Weighted SVD: Scalable Kronecker-Factored Fisher Approximation for Compressing Large Language Models.

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Efficient Inference Methods, Matrix and Tensor Factorization, Natural Language Processing, Optimization for Deep Networks, Model Selection and Structure Learning, Efficient Training Methods Unsupervised Learning
TL;DR: This work introduces GFWSVD, a post-training LLM compression method that improves over prior approaches by using a scalable Kronecker-factored approximation of the full Fisher information matrix, capturing both diagonal and off-diagonal terms.
Abstract: The Fisher information is a fundamental concept for characterizing the sensitivity of parameters in neural networks. However, leveraging the full observed Fisher information is too expensive for large models, so most methods rely on simple diagonal approximations. While efficient, this approach ignores parameter correlations, often resulting in reduced performance on downstream tasks. In this work, we mitigate these limitations and propose Generalized Fisher-Weighted SVD (GFWSVD) — a fully deterministic post-training LLM compression technique that accounts for both diagonal and off-diagonal elements of the Fisher information matrix, providing a more accurate reflection of parameter importance. To make the method tractable, we introduce a scalable adaptation of the Kronecker-factored approximation algorithm for the observed Fisher information. We demonstrate the effectiveness of our method on LLM compression, showing improvements over existing compression baselines.
Primary Area: learning theory
Submission Number: 17420
Loading