Adaptive norms for deep learning with regularized Newton methods

Jonas K Kohler; Leonard Adolphs; Aurelien Lucchi

Adaptive norms for deep learning with regularized Newton methods

Jonas K Kohler, Leonard Adolphs, Aurelien Lucchi

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Stochastic Optimization, Non-convex Optimization, Deep Learning, Adaptive methods, Newton methods, Second-order optimization

Abstract: We investigate the use of regularized Newton methods with adaptive norms for optimizing neural networks. This approach can be seen as a second-order counterpart of adaptive gradient methods, which we here show to be interpretable as first-order trust region methods with ellipsoidal constraints. In particular, we prove that the preconditioning matrix used in RMSProp and Adam satisfies the necessary conditions for provable convergence of second-order trust region methods with standard worst-case complexities on general non-convex objectives. Furthermore, we run experiments across different neural architectures and datasets to find that the ellipsoidal constraints constantly outperform their spherical counterpart both in terms of number of backpropagations and asymptotic loss value. Finally, we find comparable performance to state-of-the-art first-order methods in terms of backpropagations, but further advances in hardware are needed to render Newton methods competitive in terms of computational time.

One-sentence Summary: This paper proposes second-order variants of adaptive gradient methods such as RMSProp.

Supplementary Material: zip

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=uxxexpvfvW

10 Replies

Loading