TL;DR: We prepose a generalization of adaptive gradient methods to second-order algorithms.
Abstract: We investigate the use of ellipsoidal trust region constraints for second-order optimization of neural networks. This approach can be seen as a higher-order counterpart of adaptive gradient methods, which we here show to be interpretable as first-order trust region methods with ellipsoidal constraints. In particular, we show that the preconditioning matrix used in RMSProp and Adam satisfies the necessary conditions for provable convergence of second-order trust region methods with standard worst-case complexities. Furthermore, we run experiments across different neural architectures and datasets to find that the ellipsoidal constraints constantly outperform their spherical counterpart both in terms of number of backpropagations and asymptotic loss value. Finally, we find comparable performance to state-of-the-art first-order methods in terms of backpropagations, but further advances in hardware are needed to render Newton methods competitive in terms of time.
Code: https://www.dropbox.com/sh/cs8cokvhirjfit1/AAA-NiyMXGCrZsJPBFAXbAHUa?dl=0
Keywords: non-convex, optimization, neural networks, trust-region
Original Pdf: pdf
7 Replies
Loading