Keywords: gradient flow, tensor decomposition, feature learning, analytic solutions, generating functions, diagram expansion, wide networks
TL;DR: We propose a general diagram-based approach to analyze scaling regimes and obtain explicit analytic solutions for gradient descent evolution in large learning problems.
Abstract: We propose a general diagram-based approach to analyze scaling regimes and obtain explicit analytic solutions for gradient descent evolution in large learning problems.
We propose a general diagram-based approach to analyze scaling regimes and obtain explicit analytic solutions for gradient descent evolution in large learning problems.
We focus on a class of problems in which an identity tensor is learned by gradient descent starting from a sum of rank-one tensors with random normal weights.
A central element of our approach is to expand the loss evolution in a formal power series over time. The coefficients of this expansion can be described in terms of suitable diagrams akin to Feynman diagrams.
Depending on the scaling of the initial weight magnitude and the number of parameters, we find several extreme learning regimes, such as NTK, mean-field, under-parameterized learning, and free evolution.
These regimes include lazy training as well as strong feature learning. We identify these regimes with extreme points and sides of a hyperparameter polygon.
We then show that in some of these regimes, the loss power series satisfies a formal partial differential equation. For certain scenarios, this equation is first order and can be solved by the method of characteristics, producing explicit loss evolution formulas that agree very well with experiment.
We give a series of specific examples where this methodology is fully implemented.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 4485
Loading