Rethinking Variable-Length Encoding: Exploiting Bit Sparsity for Parallel Decoding in LLM Accelerators

Ning Yang, Fangxin Liu, Junjie Wang, Chenyang Guan, Zongwu Wang, Junping Zhao, Li Jiang, Haibing Guan

Published: 01 Dec 2025, Last Modified: 28 Feb 2026ACM Transactions on Architecture and Code OptimizationEveryoneRevisionsCC BY-SA 4.0
External IDs:doi:10.1145/3777471
Loading