Kurtosis-Aware Coupled Sparsity: Training-Free Activation Sparsity for Large Language Models

ACL ARR 2026 January Submission3490 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Activation Sparsity, Training-free, Efficient Inference
Abstract: The massive scale of Large Language Models (LLMs) incurs prohibitive computational costs, limiting their ubiquitous deployment. While activation sparsity is a promising training-free solution, existing magnitude-based methods overlook weight structures and dependencies. We reveal that standard weight norms fail as importance proxies, identifying instead that weight distribution heavy-tailedness is critical. To this end, we propose Kurtosis-Aware Coupled Sparsity (KACS), a novel training-free framework that introduces a Kurtosis-based metric to explicitly capture and prioritize outlier-rich weight columns. Furthermore, we develop an Interaction-Aware evaluation mechanism that assesses the joint importance of structurally coupled projections (e.g., Gate-Up and Q-K-V), ensuring information retention across interacting pathways. To address varying sensitivity across depths, we also design an adaptive layer-wise allocation strategy guided by input-output cosine similarity. Extensive experiments on Llama and Mistral models demonstrate that KACS consistently outperforms state-of-the-art baselines, retaining over 97\% of the original performance at 50\% sparsity.
Paper Type: Long
Research Area: Low-resource Methods for NLP
Research Area Keywords: Efficient/Low-Resource Methods for NLP
Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: english
Submission Number: 3490
Loading