WrapNet:  Neural Net Inference with Ultra-Low-Precision Arithmetic

Renkun Ni; Hong-min Chu; Oscar Castaneda; Ping-yeh Chiang; Christoph Studer; Tom Goldstein

WrapNet: Neural Net Inference with Ultra-Low-Precision Arithmetic

Renkun Ni, Hong-min Chu, Oscar Castaneda, Ping-yeh Chiang, Christoph Studer, Tom Goldstein

Published: 12 Jan 2021, Last Modified: 22 Jun 2025ICLR 2021 PosterReaders: Everyone

Keywords: quantization, efficient inference

Abstract: Low-precision neural networks represent both weights and activations with few bits, drastically reducing the cost of multiplications. Meanwhile, these products are accumulated using high-precision (typically 32-bit) additions. Additions dominate the arithmetic complexity of inference in quantized (e.g., binary) nets, and high precision is needed to avoid overflow. To further optimize inference, we propose WrapNet, an architecture that adapts neural networks to use low-precision (8-bit) additions while achieving classification accuracy comparable to their 32-bit counterparts. We achieve resilience to low-precision accumulation by inserting a cyclic activation layer that makes results invariant to overflow. We demonstrate the efficacy of our approach using both software and hardware platforms.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: We adapt neural networks to integer overflow and extreme low-bit accumulator, and show the efficacy on both software and hardware platforms.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/wrapnet-neural-net-inference-with-ultra-low/code)

10 Replies

Loading