TL;DR: We propose a permutation search method using a straight-through estimator for merging multiple neural network models.
Abstract: Ainsworth et al. empirically demonstrated that linear mode connectivity (LMC) can be achieved between two independently trained neural networks (NNs) by applying an appropriate parameter permutation. LMC is satisfied if a linear path with non-increasing test loss exists between the models, suggesting that NNs trained with stochastic gradient descent (SGD) converge to a single approximately convex low-loss basin under permutation symmetries. However, Ainsworth et al. verified LMC for two models and provided only limited discussion on its extension to multiple models. In this paper, we conduct a more detailed empirical analysis. First, we show that existing permutation search methods designed for two models can fail to transfer multiple models into the same convex low-loss basin. Next, we propose a permutation search method using a straight-through estimator for multiple models (STE-MM). We then experimentally demonstrate that even when multiple models are given, the test loss of the merged model remains nearly the same as the losses of the original models when using STE-MM, and the loss barriers between all permuted model pairs are also small. Additionally, from the perspective of the trace of the Hessian matrix, we show that the loss sharpness around the merged model decreases as the number of models increases with STE-MM, indicating that LMC for multiple models is more likely to hold. The source code implementing our method is available at https://github.com/e5-a/STE-MM.
Lay Summary: Neural networks are the backbone of modern AI, and they’re usually trained from scratch to solve specific tasks. But what happens when we train several of them separately—can we somehow combine their knowledge?
Recent research showed that two separately trained neural networks can often be “aligned” by rearranging their internal parts, allowing them to be blended without degrading performance. This led us to wonder: can the same approach be applied to more than two networks?
Our study shows that current methods don’t scale well to multiple networks. So we developed a new technique called STE-MM, which intelligently aligns and merges multiple models simultaneously. As a result, the merged model performs just as well as the original ones—and in some cases, it’s even more stable.
This is an important step toward more flexible AI systems, where different trained models can be combined rather than starting from scratch each time.
Link To Code: https://github.com/e5-a/STE-MM
Primary Area: Deep Learning->Everything Else
Keywords: Linear mode connectivity, deep learning, permutation symmetry
Submission Number: 9337
Loading