Lua-LLM: Learning Unstructured-Sparsity Allocation for Large Language Models

Mingge Lu; Jingwei Sun; Junqing Lin; Zechun Zhou; Guangzhong Sun

Lua-LLM: Learning Unstructured-Sparsity Allocation for Large Language Models

Mingge Lu, Jingwei Sun, Junqing Lin, Zechun Zhou, Guangzhong Sun

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Sparsity, Pruning, Efficient Inference, Large Language Models

TL;DR: We propose a global pruning framework that efficiently learns unstructured sparsity for LLMs.

Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities, yet their extensive parameter scales pose significant challenges for practical deployment. Unstructured pruning has emerged as an effective model compression strategy with minimal performance loss, which introduces fine-grained sparsity for weight parameters. While existing methods employ a layer-wise pruning strategy to avoid the complexity of global pruning for billion-scale LLMs, they require appropriate sparsity allocation for the layer-wise pruning objectives and often lead to suboptimal solutions for the overall model. In this paper, we propose Lua-LLM ($\textbf{L}$earning $\textbf{u}$nstructured-sparsity $\textbf{a}$llocation in LLMs), a learning-based global pruning framework that explores the optimal unstructured sparsity allocation. Unlike existing pruning methods, which primarily focus on allocating per-layer sparsity, Lua-LLM achieves flexible allocation for both layer-wise and intra-layer sparsity. Furthermore, Lua-LLM leverages a soft Top-K operator to approximate the importance-based mask selection mechanism, enabling efficient binary mask learning. Experimental results on LLaMA and OPT families demonstrate significant performance improvements over existing methods.

Supplementary Material: zip

Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)

Submission Number: 2963

Loading