Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation

Zhan Zhuang; Xiequn Wang; Wei Li; Yulong Zhang; Qiushi Huang; Shuhao Chen; Xuehao Wang; Yanbin Wei; Yuhe Nie; Kede Ma; Yu Zhang; Ying Wei

Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation

Zhan Zhuang, Xiequn Wang, Wei Li, Yulong Zhang, Qiushi Huang, Shuhao Chen, Xuehao Wang, Yanbin Wei, Yuhe Nie, Kede Ma, Yu Zhang, Ying Wei

Published: 01 May 2025, Last Modified: 25 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: CoTo: A progressive adapter dropping strategy that boosts LoRA with better generalization, efficient merging/pruning, and faster training.

Abstract: Low-rank adaptation (LoRA) has emerged as a leading parameter-efficient fine-tuning technique for adapting large foundation models, yet it often locks adapters into suboptimal minima near their initialization. This hampers model generalization and limits downstream operators such as adapter merging and pruning. Here, we propose CoTo, a progressive training strategy that gradually increases adapters' activation probability over the course of fine-tuning. By stochastically deactivating adapters, CoTo encourages more balanced optimization and broader exploration of the loss landscape. We provide a theoretical analysis showing that CoTo promotes layer-wise dropout stability and linear mode connectivity, and we adopt a cooperative-game approach to quantify each adapter's marginal contribution. Extensive experiments demonstrate that CoTo consistently boosts single-task performance, enhances multi-task merging accuracy, improves pruning robustness, and reduces training overhead, all while remaining compatible with diverse LoRA variants. Code is available at https://github.com/zwebzone/coto.

Lay Summary: Big AI models are powerful but expensive to teach new skills. Low-Rank Adaptation (LoRA) helps by training only tiny “adapter” modules, yet the standard practice loads all adapters at once—like dumping a jumbo DLC bundle onto a AAA game. When everything changes in one splash, it’s tough to trace bugs, measure each pack’s value, or see which tweaks really help. We introduce CoTo, a progressive adapter-dropping strategy that raises each adapter’s activation probability step by step. CoTo rolls adapters out one at a time, flicking them on and off so the system can test how every new piece meshes with what’s already installed. This stochastic rollout reveals each adapter’s marginal contribution and lets them cooperate smoothly once they all run together full-time. Across diverse language and vision benchmarks, CoTo slips into existing LoRA pipelines with zero extra cost, consistently helps large models absorb new knowledge faster, and greatly improves the ability to merge multiple adapters into a single, stronger model.

Link To Code: https://github.com/zwebzone/coto

Primary Area: Deep Learning->Everything Else

Keywords: Parameter-efficient fine-tuning, linear mode connectivity, low-rank adaptation, model merging

Submission Number: 6574

Loading