Characterizing Training Dynamics for Finite-width Deep Neural Networks

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: feature learning, training dynamics, high dimensional statistics
TL;DR: Non-asymptotics bounds on the weights and activations of finite-width neural networks and results suggesting gradients have low rank.
Abstract: We study the impact of (stochastic) gradient descent on feature learning through the change in weights and activations of finite-width deep fully connected networks. Under the linear width limit, where the input dimension and sample size scale proportionally with the width, we provide non-asymptotic bounds on the norm of the change in weights which characterizes the initialization schemes that allow feature learning for high-dimensional problems. Based on our bounds, we find that the asymptotic rate of the norm of the change in activations is non-increasing as we move towards earlier layers. In addition, we find that common parameterizations such as the NTK or standard parameterization are largely influenced by the last layer during training implying the importance of the last layer for feature learning. Another behavior that we find is that the gradients of each layer have a low rank suggesting that (stochastic) gradient descent for feed-forward networks results in weights with a bulk + spike structure. We empirically confirm these findings on both synthetic data and CIFAR-2 data.
Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2249
Loading