Nesterov's method is the discretization of a differential equation with Hessian dampingDownload PDF

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Withdrawn SubmissionReaders: Everyone
Abstract: Su-Boyd-Candes (2014) made a connection between Nesterov's method and an ordinary differential equation (ODE). We show if a Hessian damping term is added to the ODE from Su-Boyd-Candes (2014), then Nesterov's method arises as a straightforward discretization of the modified ODE. Analogously, in the strongly convex case, a Hessian damping term is added to Polyak's ODE, which is then discretized to yield Nesterov's method for strongly convex functions. Despite the Hessian term, both second order ODEs can be represented as first order systems. Established Liapunov analysis is used to recover the accelerated rates of convergence in both continuous and discrete time. Moreover, the Liapunov analysis can be extended to the case of stochastic gradients which allows the full gradient case to be considered as a special case of the stochastic case. The result is a unified approach to convex acceleration in both continuous and discrete time and in both the stochastic and full gradient cases.
Keywords: Nesterov's method, convex optimization, first-order methods, stochastic gradient descent, differential equations, Liapunov's method
TL;DR: We derive Nesterov's method arises as a straightforward discretization of an ODE different from the one in Su-Boyd-Candes and prove acceleration the stochastic case
9 Replies

Loading