Fast Retraining of Approximate CNNs for High Accuracy

Published: 01 Jan 2025, Last Modified: 12 Nov 2025IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: One technique for approximating neural networks (NNs) when deploying to resource-constrained systems is the use of approximate multiplications. Giving up full mathematical accuracy opens new opportunities for more efficient hardware implementations. Modeling the effects of inaccurate hardware already in the training stage improves performance but significantly slows down the training due to expensive type conversions and memory access operations. We propose a method to speed up the simulation of inaccurate hardware by using a composition of floating-point functions. Both an analytical and a data-driven method for finding these functions are provided. We further provide a study and implementation of per-channel quantization, a scheme that enhances the granularity of converting NN parameters to integers. This helps boost the application’s accuracy. In our evaluation, our floating-point models achieve up to a $4 \times $ speed-up over the commonly used lookup table implementation, while providing a high-fidelity simulation of the target function. Extending quantization with per-channel granularity yields a median accuracy improvement of 0.87 p.p. for ResNet8/CIFAR10 with 4-bit weight quantization in combination with hardware using approximate multipliers (AMs). Our extended software toolkit for the study of AMs in PyTorch is publicly available and provides a variety of building blocks for applying inaccurate product functions to NNs.
Loading