## Improving Neural Ordinary Differential Equations with Nesterov's Accelerated Gradient Method

Published: 31 Oct 2022, 18:00, Last Modified: 14 Jan 2023, 18:38NeurIPS 2022 AcceptReaders: Everyone
Keywords: neural ordinary differential equations, nesterov, momentum
TL;DR: We propose the Nesterov neural ordinary differential equations (NesterovNODEs) whose layers solve the second-order ordinary differential equations limit of Nesterov's accelerated gradient method for speeding up the training and inference of NODEs.
Abstract: We propose the Nesterov neural ordinary differential equations (NesterovNODEs), whose layers solve the second-order ordinary differential equations (ODEs) limit of Nesterov's accelerated gradient (NAG) method, and a generalization called GNesterovNODEs. Taking the advantage of the convergence rate $\mathcal{O}(1/k^{2})$ of the NAG scheme, GNesterovNODEs speed up training and inference by reducing the number of function evaluations (NFEs) needed to solve the ODEs. We also prove that the adjoint state of a GNesterovNODEs also satisfies a GNesterovNODEs, thus accelerating both forward and backward ODE solvers and allowing the model to be scaled up for large-scale tasks. We empirically corroborate the advantage of GNesterovNODEs on a wide range of practical applications, including point cloud separation, image classification, and sequence modeling. Compared to NODEs, GNesterovNODEs require a significantly smaller number of NFEs while achieving better accuracy across our experiments.
Supplementary Material: zip
35 Replies