Dithered backprop: A sparse and quantized backpropagation algorithm for more efficient deep neural network training

Abstract: Deep Neural Networks are successful but highly computationally expensive learning systems. One of the main sources of time and energy drains is the well known back-propagation (backprop) algorithm, which roughly accounts for 2/3 of the computational cost of training. In this work we propose a method for reducing the computational complexity of backprop, which we named dithered backprop. It consists on applying a stochastic quantization scheme to intermediate results of the method. The particular quantisation scheme, called non-subtractive dither (NSD), induces sparsity which can be exploited by computing efficient sparse matrix multiplications. Experiments on popular image classification tasks show that it induces 92% sparsity on average across a wide set of models at no or negligible accuracy drop in comparison to state-of-the-art approaches, thus significantly reducing the computational complexity of the backward pass. Moreover, we show that our method is fully compatible to state-of-the-art training methods that reduce the bit-precision of training down to 8-bits, as such being able to further reduce the computational requirements. Finally we discuss and show potential benefits of applying dithered backprop on a distributed training settings, in that communication as well as compute efficiency may increase simultaneously with the number of participant nodes.
0 Replies
Loading