Algorithms and Theory for Quantizing Neural Networks

Published: 21 May 2023, Last Modified: 14 Jul 2023SampTA 2023 AbstractReaders: Everyone
Abstract: Deep neural networks (DNNs) have emerged as a popular and effective tool in a wide range of applications, including computer vision and natural language processing. However, their high memory, computational, and power requirements have prompted the development of model compression techniques that reduce these costs. Among these techniques is neural network quantization, which has gained significant attention as a means of reducing the memory and computational requirements of DNNs without compromising their accuracy. Nevertheless, most existing approaches to quantization of DNNs are ad-hoc and lack rigorous performance guarantees. We present data-driven post-training quantization algorithms that can be applied directly to already trained networks. In this context, we discuss a stochastic quantization technique and provide rigorous theoretical guarantees on its performance, even in the setting of multi-layer networks, showing that it can achieve high accuracy in the over-parametrized regime.
Submission Type: Abstract
0 Replies

Loading