MoXCo: How I learned to stop exploring and love my local minima?

Published: 11 Feb 2025, Last Modified: 09 Mar 2025CPAL 2025 (Proceedings Track) PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: optimization, deep learning, a daptive methods
TL;DR: This work aims to deepen the understanding of optimization specifically through the lens of loss landscapes. We propose a generalized framework for adaptive optimization that favors convergence to flat local minimas.
Abstract: Deep neural networks are well-known for their generalization capabilities, largely attributed to optimizers’ ability to find “good” solutions in high-dimensional loss landscapes. This work aims to deepen the understanding of optimization specifically through the lens of loss landscapes. We propose a generalized framework for adaptive optimization that favors convergence to these “good” solutions. Our approach shifts the optimization paradigm from merely finding solutions quickly to discovering solutions that generalize well, establishing a careful balance between optimization efficiency and model generalization. We empirically validate our claims using two-layer, fully connected neural network with ReLU activation and demonstrate practical applicability through binary quantization of ResNets. Our numerical results demonstrate that these adaptive optimizers facilitate exploration leading to faster convergence speeds and narrow the generalization gap between stochastic gradient descent and other adaptive methods.
Submission Number: 73
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview