Bit-Pragmatic Deep Neural Network Computing

Jorge Albericio, Patrick Judd, Alberto Delmas, Sayeh Sharify, Andreas Moshovos

Nov 04, 2016 (modified: Feb 04, 2017) ICLR 2017 conference submission readers: everyone
  • Abstract: We quantify a source of ineffectual computations when processing the multiplications of the convolutional layers in Deep Neural Networks (DNNs) and propose Pragrmatic (PRA), an architecture that exploits it improving performance and energy efficiency. The source of these ineffectual computations is best understood in the context of conventional multipliers which generate internally multiple terms, that is, products of the multiplicand and powers of two, which added together produce the final product. At runtime, many of these terms are zero as they are generated when the multiplicand is combined with the zero-bits of the multiplicator. While conventional bit-parallel multipliers calculate all terms in parallel to reduce individual product latency, PRA calculates only the non-zero terms resulting in a design whose execution time for convolutional layers is ideally proportional to the number of activation bits that are 1. Measurements demonstrate that for the convolutional layers on Convolutional Neural Networks and during inference, PRA improves performance by 4.3x over the DaDiaNao (DaDN) accelerator and by 4.5x when DaDN uses an 8-bit quantized representation. DaDN was reported to be 300x faster than commodity graphics processors.
  • TL;DR: A hardware accelerator for DNNs whose execution time for convolutional layers is proportional to the number of activation *bits* that are 1.
  • Keywords: Deep learning, Applications
  • Conflicts: eecg.toronto.edu, ece.utoronto.ca, utoronto.ca, cs.toronto.edu, cs.utoronto.ca

Loading