Quadratic models for understanding neural network dynamics

Libin Zhu; Chaoyue Liu; Adityanarayanan Radhakrishnan; Misha Belkin

Quadratic models for understanding neural network dynamics

Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Misha Belkin

Published: 01 Feb 2023, Last Modified: 12 Oct 2025Submitted to ICLR 2023Readers: Everyone

Keywords: quadratic models, wide neural networks, catapult phase, optimization dynamics

TL;DR: Quadratic models capture properties of wide neural networks in both optimization and generalization.

Abstract: In this work, we show that recently proposed quadratic models capture optimization and generalization properties of wide neural networks that cannot be captured by linear models. In particular, we prove that quadratic models for shallow ReLU networks exhibit the "catapult phase" from Lewkowycz et al. (2020) that arises when training such models with large learning rates. We then empirically show that the behaviour of quadratic models parallels that of neural networks in generalization, especially in the catapult phase regime. Our analysis further demonstrates that quadratic models are an effective tool for analysis of neural networks.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/quadratic-models-for-understanding-neural/code)

9 Replies

Loading