MoXCo:How I learned to stop exploring and love my local minima?

Published: 07 Nov 2023, Last Modified: 13 Dec 2023M3L 2023 PosterEveryoneRevisionsBibTeX
Keywords: Non-convex, inertial, momentum, global optimization, exploration
TL;DR: adaptive optimizers that have good generalization capabilities
Abstract: Deep Neural Networks (DNNs) are well-known for their generalization capabilities despite overparameterization. This is commonly attributed to the optimizer’s ability to find “good” solutions within high-dimensional loss landscapes. However, widely employed adaptive optimizers, such as ADAM, may suffer from subpar generalization. This paper presents an innovative methodology, $\textit{MoXCo}$, to address these concerns by designing adaptive optimizers that not only expedite exploration with faster convergence speeds but also ensure the avoidance of over-exploitation in specific parameter regimes, ultimately leading to convergence to good solutions.
Submission Number: 82