TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training

Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Omar Mohamed Awad, Gennady Pekhimenko, Jorge Albericio, Andreas Moshovos

2020 (modified: 24 Apr 2023)MICRO 2020Readers: Everyone

Abstract: TensorDash is a hardware-based technique that enables data-parallel MAC units to take advantage of sparsity in their input operand streams. When used to compose a hardware accelerator for deep learning, TensorDash can speedup the training process while also increasing energy efficiency. TensorDash combines a low-cost sparse input operand interconnect with an area-efficient hardware scheduler. The scheduler can effectively extract sparsity in the activations, the weights, and the gradients. Over a wide set of state-of-the-art models covering various applications, TensorDash accelerates the training process by 1.95× while being 1.5× more energy efficient when incorporated on top of a Tensorcore-based accelerator at less than 5% area overhead. TensorDash is datatype agnostic and we demonstrate it with IEEE standard mixed-precision floating-point units and a popular optimized for machine learning floating-point format (BFloat16).

0 Replies