Motivating Next-Gen Accelerators with Flexible $N{:}M$ Activation Sparsity via Benchmarking Lightweight Post-Training Sparsification Approaches

Shirin Alanova; Kristina Kazistova; Ekaterina Galaeva; Alina Kostromina; Vladimir Smirnov; Redko Dmitry; Alexey Dontsov; Maxim Zhelnin; Evgeny Burnaev; Egor Shvetsov

Motivating Next-Gen Accelerators with Flexible $N{:}M$ Activation Sparsity via Benchmarking Lightweight Post-Training Sparsification Approaches

Shirin Alanova, Kristina Kazistova, Ekaterina Galaeva, Alina Kostromina, Vladimir Smirnov, Redko Dmitry, Alexey Dontsov, Maxim Zhelnin, Evgeny Burnaev, Egor Shvetsov

Published: 18 Apr 2026, Last Modified: 22 Apr 2026ACL 2026 Industry Track PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, sparsity, efficient models, inference

Abstract: The demand for efficient large language model inference has spurred interest in sparsification, yet current hardware support remains narrowly focused on 2:4 weight sparsity. In this work, we argue that activation sparsity despite being overlooked in hardware design offers a promising path for dynamic, input-adaptive compression with significant I/O and memory benefits. We present a comprehensive post-training study of $N{:}M$ activation pruning across four LLMs (Llama2-7B-chat, Llama3.1-8B-Instruct, Qwen2.5-7B-Instruct, Gemma3-4B-Instruct), demonstrating that activation pruning consistently outperforms weight pruning at matched sparsity levels. We evaluate lightweight, plug-and-play error mitigation and selection strategies that require minimal or no calibration data across four sparsity patterns: 2:4, 4:8, 8:16, and 16:32. Among these, 16:32 approaches the performance of unstructured 50\% sparsity and is is approximately 2.7$\times$ better than 2:4, while 8:16 offers an optimal balance of accuracy and practicality. Our results provide evidence that next-generation accelerators should consider native support for $N{:}M$ activation sparsity and can serve as a strong baseline for the future methods.

Submission Type: Emerging

Copyright Form: pdf

Submission Number: 47

Loading