D2-LoRA: A Synergistic Approach to Differential and Directional Low-Rank Adaptation

ICLR 2026 Conference Submission21196 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: parameter-efficient fine-tuning, low-rank adapters, directional normalization, mergeability, large language models
Abstract: We present a systematic exploration of the parameter-efficient fine-tuning design space under practical constraints, yielding D$^{2}$-LoRA---a method that reaches 76.4\% average accuracy on eight QA/RC benchmarks using only 5k training samples per task and two epochs, while retaining algebraic mergeability at inference with near-exact numerical equivalence. D$^{2}$-LoRA combines a differential signed low-rank residual with a directional per-column normalization applied only during training. Specifically, given a frozen $W_0$, we learn two rank-$r$ components forming an update $\Delta W=\tfrac{\alpha}{r}(A_+B_+-\tau A_-B_-)$. This update is then projected onto the original column norms of $W_0$ to yield $W^\star$, thereby allowing optimization to adjust directional components while preserving the original magnitude. At inference time, we merge $W^\star$ and $\Delta W$ into $\widehat W$, which incurs no additional latency. Compared to baselines, D$^{2}$-LoRA achieves a +2.2pp macro improvement over LoRA (74.2\%), and matches or exceeds DoRA. It also preserves numerical equivalence after merging (mean gap $\approx 0.03$pp; worst $0.7$pp), while restoring $\sim1.91\times$ evaluation throughput. A geometric analysis explains why projection stabilizes low-rank training, and ablation studies isolate the effects of the negative branch, rank, target modules, scoring function, and fixed~$\tau$.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 21196
Loading