Two-Phase Head-Specific LoRA: Balancing Global and Local Adaptation in Multi-Head Attention

Two-Phase Head-Specific LoRA: Balancing Global and Local Adaptation in Multi-Head Attention

ICLR 2026 Conference Submission17574 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Low-Rank Adaptation, LoRA, Head-Specific LoRA, Two-Phase Fine-Tuning, Multi-Head Attention, Parameter-Efficient Fine-Tuning

Abstract: Low-Rank Adaptation (LoRA) has become a standard technique for parameter-efficient fine-tuning of large pretrained models. However, applying a single low-rank update to the entire weight matrix assumes that all attention heads require the same adaptation, overlooking their diverse functional roles. Simply increasing rank under this setting often leads to diminishing returns and redundant parameter usage. To address this, we propose \textbf{Two-Phase Head-Specific LoRA (HS-LoRA)}. In the first phase, a global adapter---instantiated by any method that applies a shared update to the full multi-head weight matrix---absorbs broad domain-shift information common across heads. In the second phase, lightweight head-specific adapters refine residual variations, recovering individuality suppressed by the global update. This two-phase design disentangles adaptation into a shared global subspace and multiple head-specific residual subspaces, balancing efficiency with expressiveness. On the VTAB-1k benchmark, HS-LoRA yields substantial gains in Structured tasks (up to +7.59 pp) and shows complementary improvements when combined with global methods such as PiSSA and CaRA.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 17574

Loading