WrapNet: Neural Net Inference with Ultra-Low-Precision ArithmeticDownload PDF

28 Sep 2020 (modified: 25 Jan 2021)ICLR 2021 PosterReaders: Everyone
  • Keywords: quantization, efficient inference
  • Abstract: Low-precision neural networks represent both weights and activations with few bits, drastically reducing the multiplication complexity. Nonetheless, these products are accumulated using high-precision (typically 32-bit) additions, an operation that dominates the arithmetic complexity of inference when using extreme quantization (e.g., binary weights). To further optimize inference, we propose WrapNet that adapts neural networks to use low-precision (8-bit) additions in the accumulators, achieving classification accuracy comparable to their 32-bit counterparts. We achieve resilience to low-precision accumulation by inserting a cyclic activation layer, as well as an overflow penalty regularizer. We demonstrate the efficacy of our approach on both software and hardware platforms.
  • Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
  • One-sentence Summary: We adapt neural networks to integer overflow and extreme low-bit accumulator, and show the efficacy on both software and hardware platforms.
  • Supplementary Material: zip
10 Replies