Improving Neural Ordinary Differential Equations with Nesterov's Accelerated Gradient Method

Nghia Nguyen; Tan Minh Nguyen; Võ Thục Khánh Huyền; Stanley Osher; Thieu Vo

Improving Neural Ordinary Differential Equations with Nesterov's Accelerated Gradient Method

Nghia Nguyen, Tan Minh Nguyen, Võ Thục Khánh Huyền, Stanley Osher, Thieu Vo

Published: 31 Oct 2022, Last Modified: 14 Jan 2023NeurIPS 2022 AcceptReaders: Everyone

Keywords: neural ordinary differential equations, nesterov, momentum

TL;DR: We propose the Nesterov neural ordinary differential equations (NesterovNODEs) whose layers solve the second-order ordinary differential equations limit of Nesterov's accelerated gradient method for speeding up the training and inference of NODEs.

Abstract: We propose the Nesterov neural ordinary differential equations (NesterovNODEs), whose layers solve the second-order ordinary differential equations (ODEs) limit of Nesterov's accelerated gradient (NAG) method, and a generalization called GNesterovNODEs. Taking the advantage of the convergence rate $\mathcal{O}(1/k^{2})$ of the NAG scheme, GNesterovNODEs speed up training and inference by reducing the number of function evaluations (NFEs) needed to solve the ODEs. We also prove that the adjoint state of a GNesterovNODEs also satisfies a GNesterovNODEs, thus accelerating both forward and backward ODE solvers and allowing the model to be scaled up for large-scale tasks. We empirically corroborate the advantage of GNesterovNODEs on a wide range of practical applications, including point cloud separation, image classification, and sequence modeling. Compared to NODEs, GNesterovNODEs require a significantly smaller number of NFEs while achieving better accuracy across our experiments.

Supplementary Material: zip

35 Replies

Loading