Module-wise Training of Residual Networks via the Minimizing Movement Scheme

Skander Karkar; Ibrahim Ayed; Emmanuel de Bezenac; patrick gallinari

Module-wise Training of Residual Networks via the Minimizing Movement Scheme

Skander Karkar, Ibrahim Ayed, Emmanuel de Bezenac, patrick gallinari

22 Sept 2022 (modified: 22 Jun 2025)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Deep learning, Layer-wise training, Optimal transport, Locking problems, Parallelism

TL;DR: We introduce a regularization inspired by the minimizing movement scheme for gradient flows in distribution space for layer-wise training of neural networks.

Abstract: Greedy layer-wise or module-wise training of neural networks is compelling in constrained and on-device settings, as it circumvents a number of problems of end-to-end back-propagation. However, it suffers from a stagnation problem, whereby early layers overfit and deeper layers stop increasing the test accuracy after a certain depth. We propose to solve this issue by introducing a simple module-wise regularization inspired by the minimizing movement scheme for gradient flows in distribution space. The method, which we call TRGL for Transport Regularized Greedy Learning, is particularly well-adapted to residual networks. We study it theoretically, proving that it leads to greedy modules that are regular and that successively solve the task. Experimentally, we show improved accuracy of module-wise trained networks when our regularization is added.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/module-wise-training-of-residual-networks-via/code)

4 Replies

Loading