Understanding Mode Connectivity via Parameter Space Symmetry

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We study why and when mode connectivity holds by linking topological properties of symmetry groups to those of the minimum.
Abstract: Neural network minima are often connected by curves along which train and test loss remain nearly constant, a phenomenon known as mode connectivity. While this property has enabled applications such as model merging and fine-tuning, its theoretical explanation remains unclear. We propose a new approach to exploring the connectedness of minima using parameter space symmetry. By linking the topology of symmetry groups to that of the minima, we derive the number of connected components of the minima of linear networks and show that skip connections reduce this number. We then examine when mode connectivity and linear mode connectivity hold or fail, using parameter symmetries which account for a significant part of the minimum. Finally, we provide explicit expressions for connecting curves in the minima induced by symmetry. Using the curvature of these curves, we derive conditions under which linear mode connectivity approximately holds. Our findings highlight the role of continuous symmetries in understanding the neural network loss landscape.
Lay Summary: Modern neural networks often have many solutions that perform equally well. Surprisingly, many of these solutions can be connected by paths along which performance stays consistently high. We wanted to understand why and when this happens. Our research shows that these connections often arise from symmetries---transformations of the network’s parameters that leave its performance unchanged. While the space of solutions is complex, the mathematical structure of these symmetries is well understood. By linking these two spaces, we use what we know about symmetries to uncover the structure of the solution landscape. This approach lets us count how many disconnected groups of solutions exist and shows how architectural features, like skip connections in ResNets, can make solutions more connected. We also derive exact formulas for constructing the connecting paths, instead of finding them by trial and error. Understanding how solutions are connected can improve how we merge, fine-tune, and ensemble trained models, helping to make machine learning systems more efficient and reliable.
Primary Area: Deep Learning
Keywords: symmetry, mode connectivity
Submission Number: 14800
Loading