Keywords: deep learning theory, non-linear dynamics, optimization
TL;DR: Recently observed non-linear learning effects like progressive sharpening and edge of stability occur generically in a simple, second order regression model.
Abstract: Recent studies of learning algorithms have shown that there is a regime with an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization of the eigenvalue near the maximum value which allows convergence (edge of stability). We consider a class of predictive models that are quadratic in the parameters, which we call second-order regression models. This is in contrast with the neural tangent kernel regime, where the predictive function is linear in the parameters. For quadratic objectives in two dimensions, we prove that this second order regression model exhibits both progressive sharpening and edge of stability behavior. We then show that in higher dimensions, the model shows this behavior generically without the structure of a neural network, due to a non-linearity induced in the learning dynamics. Finally, we show that edge of stability behavior in neural networks is correlated with the behavior in quadratic regression models.