BALF: Budgeted Activation-Aware Low-Rank Factorization for Fine-Tuning-Free Model Compression

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: model compression, factorization, decomposition, activation-aware, SVD, low-rank
TL;DR: We propose a principled fine-tuning-free low-rank factorization framework that works on a broad class of architectures.
Abstract: Neural network compression techniques typically require expensive fine-tuning or search procedures, rendering them impractical on commodity hardware. Inspired by recent LLM compression research, we present a general activation-aware factorization framework that can be applied to a broad range of layers. Moreover, we introduce a scalable budgeted rank allocator that allows flexible control over compression targets (e.g., retaining 50\% of parameters) with no overhead. Together, these components form BALF, an efficient pipeline for compressing models without fine-tuning. We demonstrate its effectiveness across multiple scales and architectures, from ResNet-20 on CIFAR-10 to ResNeXt-101 and vision transformers on ImageNet, and show that it achieves excellent results in the fine-tuning-free regime. For instance, BALF reduces FLOPs on ResNeXt-101 by 45\% with only a 1-percentage-point top-1 accuracy drop.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 6357
Loading