MoXCo: How I learned to stop exploring and love my local minima?

Esha Singh; Shoham Sabach; Yu-Xiang Wang

MoXCo: How I learned to stop exploring and love my local minima?

Esha Singh, Shoham Sabach, Yu-Xiang Wang

Published: 11 Feb 2025, Last Modified: 09 Mar 2025CPAL 2025 (Proceedings Track) PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: optimization, deep learning, a daptive methods

TL;DR: This work aims to deepen the understanding of optimization specifically through the lens of loss landscapes. We propose a generalized framework for adaptive optimization that favors convergence to flat local minimas.

Abstract: Deep neural networks are well-known for their generalization capabilities, largely attributed to optimizers’ ability to find “good” solutions in high-dimensional loss landscapes. This work aims to deepen the understanding of optimization specifically through the lens of loss landscapes. We propose a generalized framework for adaptive optimization that favors convergence to these “good” solutions. Our approach shifts the optimization paradigm from merely finding solutions quickly to discovering solutions that generalize well, establishing a careful balance between optimization efficiency and model generalization. We empirically validate our claims using two-layer, fully connected neural network with ReLU activation and demonstrate practical applicability through binary quantization of ResNets. Our numerical results demonstrate that these adaptive optimizers facilitate exploration leading to faster convergence speeds and narrow the generalization gap between stochastic gradient descent and other adaptive methods.

Submission Number: 73

Loading