Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Deep learning, Stochastic thermodynamics, Non-equilibrium physics, Langevin dynamics
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Optimal thermodynamic sped limit bounds on the training of deep learning are derived, analytical and numerical derivation of these bounds show when training is optimal as a function of the input covariance and initialization.
Abstract: State-of-the-art neural networks require extreme computational power to train. It is therefore natural to wonder whether they are optimally trained. Here we apply a recent advancement in stochastic thermodynamics which allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network, based on the ratio of their Wasserstein-2 distance and the entropy production rate of the dynamical process connecting them. Considering both gradient-flow and Langevin training dynamics, we provide analytical expressions for these speed limits for linear and linearizable (e.g. NTK) neural networks. Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels-- learning is optimal in a scaling sense. Our results are consistent with small-scale experiments with CNNs and FCNs on CIFAR-10, showing a short highly non-optimal regime followed by a longer optimal regime.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2770
Loading