SUSI: Scalable Semi-Structured Pruning via Differentiable Subset Sampling

SUSI: Scalable Semi-Structured Pruning via Differentiable Subset Sampling

ACL ARR 2026 January Submission5190 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Semi-Structured Pruning, Variational Mask Learning, Differentiable Subset Sampling, Model Compression

Abstract: Semi-structured $N$:$M$ sparsity has emerged as a practical direction for accelerating large language models (LLMs). However, existing learnable-mask approaches incur substantial parameter and memory overhead, limiting their scalability to large models and aggressive sparsity regimes. In this work, we revisit $N$:$M$ pruning from a perspective that reconciles efficiency with scalability. We propose SUSI, Semi-structured prUning via Subset samplIng, a lightweight semi-structured pruning framework that learns sparsity masks through differentiable subset sampling via weighted reservoir sampling. Unlike prior methods that model full categorical distributions over all feasible $N$:$M$ patterns, SUSI reformulates sparsity mask learning as a sampling without replacement from a compact set of logits, reducing trainable parameters from combinatorial complexity to $\mathcal{O}\left(M\right)$. As a result, SUSI requires 1.5–8.75$\times$ fewer learnable parameters and significantly lower memory cost, while remaining fully aligned with hardware-friendly sparsity patterns. Extensive evaluations across multiple scales of the Qwen2.5 LLM family (0.5-7B parameters) demonstrate that SUSI achieves competitive performance with strong memory efficiency, stability across random seeds, and scalability to more aggressive $N$:$M$ sparsity patterns, offering a practical path toward efficient LLM deployment.

Paper Type: Long

Research Area: LLM Efficiency

Research Area Keywords: prunning, LLM Efficiency, parameter-efficient-training

Contribution Types: Approaches low compute settings-efficiency, Theory

Languages Studied: English

Submission Number: 5190

Loading