SUSI: Semi-Structured Pruning for LLMs via Differentiable Subset Sampling

Ha Dinh; Xuan Duy Ta; Khoat Than; Khac-Hoai Nam Bui

SUSI: Semi-Structured Pruning for LLMs via Differentiable Subset Sampling

Ha Dinh, Xuan Duy Ta, Khoat Than, Khac-Hoai Nam Bui

18 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Compression, Semi-Structured Pruning, Differentiable Subset Sampling, Learnable Sparsity

Abstract: The rapid growth of large language models (LLMs) has driven the need for efficient post-training optimization techniques for reducing computational and memory demands while preserving performance. Semi-structured pruning, which enforces hardware-compatible sparsity patterns like N:M sparsity, offers a balanced approach for accelerating inference. In this study, we introduce SUSI(Semi-structured prUning via Subset samplIng), a novel semi-structured pruning method that leverages the weighted reservoir and differentiable subset sampling to learn high-quality N:M sparsity masks with minimal computational cost. Compared to other learnable mask methods (i.e., MaskLLM), which increase parameter complexity, SUSI reduces trainable parameters by up to 1.5× for the 2:4 sparsity, enabling efficient deployment on hardware optimized for sparse computation. We evaluate SUSI on three OPT model variants (125M, 350M, and 1.3B parameters) using benchmarks including Wikitext-2 for perplexity and zero-shot NLP tasks (e.g., ARC, HellaSwag, PIQA, RACE, SciQ). SUSI consistently surpasses baselines such as SparseGPT, Wanda, and MaskLLM in perplexity while maintaining competitive zero-shot accuracy across various benchmarks. These results establish SUSI as a robust and practical solution for compressing LLMs, facilitating efficient deployment in resource-constrained environments.

Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)

Submission Number: 10861

Loading