Unveiling Linear Mode Connectivity of Re-basin from Neuron Distribution Perspective

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: linear mode connectivity, non-uniformity and entropy, neuron alignment, permutation invariance, model fusion
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We investigated the role of neuron distribution entropy in enhancing linear mode connectivity.
Abstract: In deep learning, stochastic gradient descent (SGD) finds many minima that are functionally similar but divergent in parameter space, and connecting the two SGD solutions will depict a loss landscape called linear mode connectivity (LMC), where barriers usually exist. Improving LMC plays an important role in model ensemble, model fusion, and federated learning. Previous works of re-basin map different solutions into the same basin to reduce the barriers in LMC, using permutation symmetry. It is found that the re-basin methods work poorly in early training and emerge to improve LMC after several epochs. Also, the performances of re-basins are usually suboptimal that they can find permutations to reduce the barrier but cannot eliminate it (or the reduction is marginal). However, there is no unified theory on when and why re-basins will improve LMC above chance, and unveiling the behind mechanism is fundamental to improving re-basin approaches and further understanding the loss landscape and training dynamics of deep learning. Therefore, in this paper, we propose a theory from the neuron distribution perspective to demystify the mechanism behind the LMC of re-basin. In our theory, we use Shannon entropy to depict the uniformity of neuron distributions and derive that non-uniformity (entropy decrease) will result in better LMC after re-basin. In accordance with our theory, we present the following observations, all of which can be aptly explained by our theory. i) The LMC of re-basin changes in various non-uniform initializations. ii) The re-basin's LMC improvement emerges after training due to the neuron distribution change. iii) The LMC of re-basin changes when pruning with different pruning ratios. Building upon these findings, we further showcase how to apply our theory to refine the performances of other neuron alignment methods beyond re-basin, e.g., OTFusion and FedMA.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7417
Loading