Abstract: Neural ordinary differential equations (neural ODEs) represent a widely used class of deep learning models characterized by continuous depth. Understanding the generalization error bound is important to evaluate how well a model is expected to perform on new, unseen data. Earlier works in this direction involved considering the linear case on the dynamics function (a function that models the evolution of state variables) of Neural ODE Marion (2023). Other related work is on bound for Neural Controlled ODE Bleistein & Guilloux (2023) that depends on the sampling gap. We consider a class of neural ordinary differential equations (ODEs) with a general nonlinear function for time-dependent and time-independent cases which is Lipschitz with respect to state variables. We observed that the solution of the neural ODEs would be of bounded variations if we assume that the dynamics function of Neural ODEs is Lipschitz continuous with respect to the hidden state. We derive a generalization bound for the time-dependent and time-independent Neural ODEs. We showed the effect of overparameterization and domain bound in the generalization error bound. This is the first time, the generalization bound for the Neural ODE with a general non-linear function has been found.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=odqo93kUX0
Changes Since Last Submission: We removed the graph of the generalization gap versus the regularization parameter for the time-independent Neural ODE, as the results with weight decay did not align with theoretical results. Instead, we added two experiments showing the generalization gap versus the Lipschitz constant for Neural ODEs on MNIST and CIFAR-10. These experiments highlight the importance of controlling the Lipschitz continuity of the learned dynamics in Neural ODEs to ensure good generalization. Moreover, they provide empirical support for the theoretical notion that models with smoother transformations (lower Lipschitz constants) are less prone to overfitting.
Assigned Action Editor: ~Niki_Kilbertus1
Submission Number: 5325
Loading