Bit-Pragmatic Deep Neural Network Computing

Jorge Albericio, Patrick Judd, Alberto Delmás, Sayeh Sharify, Andreas Moshovos

Feb 17, 2017 (modified: Feb 17, 2017) ICLR 2017 workshop submission readers: everyone
  • Abstract: We quantify a source of ineffectual computations when processing the multiplications of the convolutional layers in Deep Neural Networks (DNNs) and propose Pragmatic (PRA), an architecture that exploits it improving performance and energy efficiency. The source of these ineffectual computations is best understood in the context of conventional multipliers which generate internally multiple terms, that is, products of the multiplicand and powers of two, which added together produce the final product. At runtime, many of these terms are zero as they are generated when the multiplicand is combined with the zero-bits of the multiplicator. While conventional bit-parallel multipliers calculate all terms in parallel to reduce individual product latency, Pragmatic calculates only the non-zero terms resulting in a design whose execution time for convolutional layers is ideally proportional to the number of activation bits that are 1. Measurements demonstrate that for the convolutional layers on Convolutional Neural Networks and during inference, Pragmatic improves performance by 4.3x over the DaDiaNao (DaDN) accelerator and by 4.5x when DaDN uses an 8-bit quantized representation. DaDiannao was reported to be 300x faster than commodity graphics processors.
  • TL;DR: Hardware accelerator able to skip zero bits in activations. Up to 4.5x performance improvement over the state-of-the-art
  • Keywords: Computer vision, Deep learning
  • Conflicts: utoronto.ca, nvidia.com

Loading