On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)

Zhiyuan Li; Sadhika Malladi; Sanjeev Arora

On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)

Zhiyuan Li, Sadhika Malladi, Sanjeev Arora

Published: 09 Nov 2021, Last Modified: 26 May 2025NeurIPS 2021 PosterReaders: Everyone

Keywords: stochastic gradient descent, stochastic differential equations, weak approximation, linear scaling rule

Abstract: It is generally recognized that finite learning rate (LR), in contrast to infinitesimal LR, is important for good generalization in real-life deep nets. Most attempted explanations propose approximating finite-LR SGD with Itô Stochastic Differential Equations (SDEs), but formal justification for this approximation (e.g., Li et al., 2019) only applies to SGD with tiny LR. Experimental verification of the approximation appears computationally infeasible. The current paper clarifies the picture with the following contributions: (a) An efficient simulation algorithm SVAG that provably converges to the conventionally used Itô SDE approximation. (b) A theoretically motivated testable necessary condition for the SDE approximation and its most famous implication, the linear scaling rule (Goyal et al., 2017), to hold. (c) Experiments using this simulation to demonstrate that the previously proposed SDE approximation can meaningfully capture the training and generalization properties of common deep nets.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

TL;DR: We study with theory and experiments (using a new method SVAG) the validity of the popular SDE Approximation to SGD as well as the Linear Scaling Rule.

Supplementary Material: pdf

Code: https://github.com/sadhikamalladi/svag

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/on-the-validity-of-modeling-sgd-with/code)

16 Replies

Loading