On Convexity and Linear Mode Connectivity in Neural Networks

David Yunis; Kumar Kshitij Patel; Pedro Henrique Pamplona Savarese; Gal Vardi; Jonathan Frankle; Matthew Walter; Karen Livescu; Michael Maire

On Convexity and Linear Mode Connectivity in Neural Networks

David Yunis, Kumar Kshitij Patel, Pedro Henrique Pamplona Savarese, Gal Vardi, Jonathan Frankle, Matthew Walter, Karen Livescu, Michael Maire

Published: 23 Nov 2022, Last Modified: 05 May 2023OPT 2022 PosterReaders: Everyone

Keywords: Neural networks, convexity, mode connectivity

TL;DR: After a small portion of training, SGD finds convex regions in the loss landscape that are not defined by the functional similarity of the solutions inside.

Abstract: In many cases, neural networks trained with stochastic gradient descent (SGD) that share an early and often small portion of the training trajectory have solutions connected by a linear path of low loss. This phenomenon, called linear mode connectivity (LMC), has been leveraged for pruning and model averaging in large neural network models, but it is not well understood how broadly or why it occurs. LMC suggests that SGD trajectories somehow end up in a \textit{``convex"} region of the loss landscape and stay there. In this work, we confirm that this eventually does happen by finding a high-dimensional convex hull of low loss between the endpoints of several SGD trajectories. But to our surprise, simple measures of convexity do not show any obvious transition at the point when SGD will converge into this region. To understand this convex hull better, we investigate the functional behaviors of its endpoints. We find that only a small number of correct predictions are shared between all endpoints of a hull, and an even smaller number of correct predictions are shared between the hulls, even when the final accuracy is high for every endpoint. Thus, we tie LMC more tightly to convexity, and raise several new questions about the source of this convexity in neural network optimization.

0 Replies

Loading