Training-time Neuron Alignment for Improving Linear Mode Connectivity and Model Fusion

18 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: deep learning, linear mode connectivity, neuron alignment, permutation invariance, model fusion
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We study training-time neuron alignment for improving linear mode connectivity through the lens of subspaces.
Abstract: In deep learning, Stochastic Gradient Descent (SGD) will find different solutions that are functionally similar but far away from each other in the parameter space. The loss landscape of linearly connecting two SGD solutions is called Linear Mode Connectivity (LMC), which often shows barriers. Current neuron alignment methods seek to find a network permutation that can map two SGD solutions into the same loss basin to improve LMC and model fusion. However, these methods are post-hoc and usually require large computations due to the astronomical number of permutation matrices. Can we realize training-time neuron alignment? In this paper, we first hypothesize that it can be realized by learning into an effective subspace. First, we provide a preliminary theoretical result to support the hypothesis. We further propose a subspace algorithm for partially fixing neuron weights to reduce the potential permutation symmetries without hurting accuracy. It is found that by applying our training-time alignment method, the LMC is largely improved and the required computation for post-matching is reduced. Interestingly, we also find random pruning at initialization can improve connectivity, which validates our subspace hypothesis. Lastly, we propose two algorithms, incorporating training-time neuron alignment in federated learning, to showcase its prospects in boosting model fusion even under heterogeneous datasets.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1333
Loading