Leveraging Rotation Symmetry for Efficient LoRA Merging in Large Language Models

16 Sept 2025 (modified: 06 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Natural Language Processing, Large Language Models, Model Merging, Low-Rank Adaptation
TL;DR: The paper introduces TSPA, a two-stage parameter alignment framework that effectively performs multi-LoRA merging in LLMs, alleviating parameter interference while outperforming existing methods in robustness and scalability.
Abstract: Merging a large number of low-rank adaptations (LoRAs) is a key technology for enhancing the integration and deployment efficiency of large language models (LLMs). However, this process has long been hindered by the catastrophic "parameter interference" problem, which often leads to a sharp decline in model performance after merging. Existing merging methods are vulnerable when dealing with complex conflicts, such as high-rank LoRAs. While the classical rotation alignment approach can enhance robustness, it is difficult to apply due to incompatibility with the LoRA structure and its high computational complexity. To address these challenges, we propose a novel two-stage parameter alignment (TSPA) framework. TSPA fundamentally overcomes the limitations of existing methods through two core strategies: (1) we innovatively design an alignment mechanism within the LoRA low-rank space, which effectively resolves the structural compatibility issue while maintaining functional equivalence; (2) we introduce an alignment paradigm of "comparison with an average model," which reduces computational complexity from quadratic to linear, ensuring the scalability of the method. To guide the alignment process, TSPA further designs two complementary optimization objectives: macro-functional alignment and micro-parameter alignment, and uses Stiefel manifold optimization to solve the problem, steadily maintaining the orthogonality of the rotation matrices during iterations. We conduct experiments on Natural Language Processing (NLP) tasks using models such as Llama-3-8B. The results show that TSPA not only outperforms state-of-the-art (SOTA) baseline methods, including DARE, in terms of average performance across tasks but also demonstrates unique comprehensive advantages: its two-stage design achieves the optimal balance between task capabilities and general knowledge; it exhibits greater robustness than SOTA methods in high-rank and high-interference scenarios; and it shows significant effectiveness in retaining fine-grained functions, such as "safety capability." This work presents a novel and practical framework for efficient, powerful, and stable multi-task model merging.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 6634
Loading