Linear Backprop in non-linear networksDownload PDF

20 Oct 2018 (modified: 13 Nov 2018)NIPS 2018 Workshop CDNNRIA Blind SubmissionReaders: Everyone
  • Abstract: Backprop is the primary learning algorithm used in many machine learning algorithms. In practice, however, Backprop in deep neural networks is a highly sensitive learning algorithm and successful learning depends on numerous conditions and constraints. One set of constraints is to avoid weights that lead to saturated units. The motivation for avoiding unit saturation is that gradients vanish and as a result learning comes to a halt. Careful weight initialization and re-scaling schemes such as batch normalization ensure that input activity to the neuron is within the linear regime where gradients are not vanished and can flow. Here we investigate backpropagating error terms only linearly. That is, we ignore the saturation that arise by ensuring gradients always flow. We refer to this learning rule as Linear Backprop since in the backward pass the network appears to be linear. In addition to ensuring persistent gradient flow, Linear Backprop is also favorable when computation is expensive since gradients are never computed. Our early results suggest that learning with Linear Backprop is competitive with Backprop and saves expensive gradient computations.
  • TL;DR: We ignore non-linearities and do not compute gradients in the backward pass to save computation and to ensure gradients always flow.
  • Keywords: Backpropagation, learning algorithms, linear backpropagation
10 Replies