Abstract: The backpropagation algorithm remains the dominant and most successful method for
training deep neural networks (DNNs). At the same time, training DNNs at scale comes at
a significant computational cost and therefore a high carbon footprint. Converging evidence
suggests that input decorrelation may speed up deep learning. However, to date, this has not
yet translated into substantial improvements in training efficiency in large-scale DNNs. This
is mainly caused by the challenge of enforcing fast and stable network-wide decorrelation.
Here, we show for the first time that much more efficient training of very deep neural networks
using decorrelated backpropagation is feasible. To achieve this goal we made use of a novel
algorithm which induces network-wide input decorrelation using minimal computational
overhead. By combining this algorithm with careful optimizations, we achieve a more than
two-fold speed-up and higher test accuracy compared to backpropagation when training
a 18-layer deep residual network. This demonstrates that decorrelation provides exciting
prospects for efficient deep learning at scale.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=4nrrZAY7KT&nesting=2&sort=date-desc
Changes Since Last Submission: Added appendices with additional experiments showing generalization to different architectures and sensitivity to decorrelation downsampling.
Assigned Action Editor: ~Krzysztof_Jerzy_Geras1
Submission Number: 2821
Loading