Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Bit-Pragmatic Deep Neural Network Computing
Jorge Albericio, Patrick Judd, Alberto Delmás, Sayeh Sharify, Andreas Moshovos
Feb 17, 2017 (modified: Feb 17, 2017)ICLR 2017 workshop submissionreaders: everyone
Abstract:We quantify a source of ineffectual computations when processing the multiplications of the convolutional layers in Deep Neural Networks (DNNs) and propose Pragmatic (PRA), an architecture that exploits it improving performance and energy efficiency.
The source of these ineffectual computations is best understood in the context of conventional multipliers which generate internally multiple terms, that is, products of the multiplicand and powers of two, which added together produce the final product. At runtime, many of these terms are zero as they are generated when the multiplicand is combined with the zero-bits of the multiplicator. While conventional bit-parallel multipliers calculate all terms in parallel to reduce individual product latency, Pragmatic calculates only the non-zero terms resulting in a design whose execution time for convolutional layers is ideally proportional to the number of activation bits that are 1. Measurements demonstrate that for the convolutional layers on Convolutional Neural Networks and during inference, Pragmatic improves performance by 4.3x over the DaDiaNao (DaDN) accelerator and by 4.5x when DaDN uses an 8-bit quantized representation. DaDiannao was reported to be 300x faster than commodity graphics processors.
TL;DR:Hardware accelerator able to skip zero bits in activations. Up to 4.5x performance improvement over the state-of-the-art
Keywords:Computer vision, Deep learning
Enter your feedback below and we'll get back to you as soon as possible.