Unified Stochastic Framework for Neural Network Quantization and Pruning

Haoyu Zhang, Rayan Saab

Published: 01 Jan 2024, Last Modified: 15 May 2025CoRR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Quantization and pruning are two essential techniques for compressing neural networks, yet they are often treated independently, with limited theoretical analysis connecting them. This paper introduces a unified framework for post-training quantization and pruning using stochastic path-following algorithms. Our approach builds on the Stochastic Path Following Quantization (SPFQ) method, extending its applicability to pruning and low-bit quantization, including challenging 1-bit regimes. By incorporating a scaling parameter and generalizing the stochastic operator, the proposed method achieves robust error correction and yields rigorous theoretical error bounds for both quantization and pruning as well as their combination.