Keywords: low rank adaptation, multi-task learning, mixture of experts, model adaptation, parameter efficient fine tuning
Abstract: Low-Rank Adaptation (LoRA) is the de-facto method for parameter-efficient fine-tuning Vision Transformers (ViTs). However, when applied to multi-task learning, the conventional method of training a LoRA module for each task independently leads to misaligned feature subspaces at inference, i.e., the semantic meanings of a feature dimension from two different LoRA modules are not aligned and may cancel each other in bad cases. Current solutions employ parameter regularization or feature routing, but they operate under the flawed assumption that task subspaces are independent, which is not the case in reality, resulting in limited improvements. In this paper, we first dive into the conflict problem on multiple multi-task datasets, and have two key observations. First, we reveal that LoRA's high singular value components encode discriminative information, while low singular value components accumulate noise. Second, we identify a critical source of feature misalignment from the perspective of the gradient: applying LoRA modules to component-level matrices ($W_q$, $W_k$, $W_v$) rather than block-level may amplify conflicting gradients during backpropagation. Based on these, we develop our own add-on, plug-and-play solution for multi-task LoRA. Specifically, we propose 1) fine-grained routing with 2) spectrum-aware regularization, and 3) block-level LoRA adaptation. Their integration with the best baseline methods, such as HydraLoRA, delivers large-margin improvements and state-of-the-art results. We name our final integrated approach mtLoRA. The efficacy of mtLoRA is validated through extensive experiments on a variety of multi-task benchmarks. These include natural language understanding (Dolly-15K), cross-domain adaptation (DOTA), and fine-grained classification (iNaturalist), where it outperforms current multi-task LoRA variants. An ablation study further elucidates that our core contributions, spectral-aware regularization, fine-grained routing, and block-level adaptation, are instrumental in achieving these performance improvements.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 16292
Loading