Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion

Binchi Zhang; Zaiyi Zheng; Zhengzhang Chen; Jundong Li

Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion

Binchi Zhang, Zaiyi Zheng, Zhengzhang Chen, Jundong Li

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Symmetry in the parameter space of deep neural networks (DNNs) has proven beneficial for various deep learning applications. A well-known example is the permutation symmetry in Multi-Layer Perceptrons (MLPs), where permuting the rows of weight matrices in one layer and applying the inverse permutation to adjacent layers yields a functionally equivalent model. While permutation symmetry fully characterizes the equivalence set for MLPs, its discrete nature limits its utility for transformers. In this paper, we introduce rotation symmetry, a novel form of parameter space symmetry for transformers that generalizes permutation symmetry by rotating parameter matrices in self-attention layers. Unlike permutation symmetry, rotation symmetry operates in a continuous domain, thereby significantly expanding the equivalence set for transformers. Based on this property, we propose a theoretically optimal parameter matching algorithm as a plug-and-play module to enhance model fusion. We evaluate our approach using pre-trained transformers across diverse natural language and vision tasks. Experimental results demonstrate that our rotation symmetry-based matching algorithm substantially improves model fusion, highlighting the potential of parameter space symmetry to facilitate model fusion. Our code is available on https://github.com/zhengzaiyi/RotationSymmetry

Lay Summary: Consider two AI models, one skilled in sentiment analysis and the other in logical reasoning. Our goal is to obtain a new model that performs well on both tasks, but building up such a model from scratch is expensive. A common approach is to directly fuse the two models into a single one, such averaging their parameters, so-called model merging. But this is harder than it sounds: even if two models do the same job, their “internal wiring” might be arranged differently, making direct merging ineffective. This paper presents a new method to align model parameters before merging. Imagine there is a space where each model corresponds to a vector. Our approach rotates one of the models within this space to bring it closer to the other, which is pretty similar to align a screwdriver with a screw head before turning it. Since we ensure that the models remain functionally equivalent before and after alignment, model merging can benefit from this alignment without any adverse side effects. Our experiments show that this method improves the performance of merged models on both text and image tasks. This work provides a simple and effective way to make different model merging methods work better.

Link To Code: https://github.com/zhengzaiyi/RotationSymmetry

Primary Area: Deep Learning->Attention Mechanisms

Keywords: Rotation Symmetry, Parameter Matching, Model Fusion, Transformers

Submission Number: 8800

Loading