DistProp: A Scalable Approach to Lagrangian Training via Distributional Approximation

Manuel Del Verme; Pierre-Luc Bacon

DistProp: A Scalable Approach to Lagrangian Training via Distributional Approximation

Manuel Del Verme, Pierre-Luc Bacon

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Abstract: We develop a multiple shooting method for learning in deep neural networks based on the Lagrangian perspective on automatic differentiation. Our method leverages ideas from saddle-point optimization to derive stable first-order updates to solve a specific constrained optimization problem. Most importantly, we propose a novel solution allowing us to run our algorithm over mini-batches with stochastic gradient fashion and to decouple the number of auxiliary variables with the size of the dataset. We show empirically that our method reliably achieves higher accuracy than other comparable local (biologically plausible) learning methods on MNIST, CIFAR10 and ImageNet.

5 Replies

Loading