Predicting LoRA Adapters Across Model Families: A Training-Free Approach via Anchor-Space Ridge Regression

Predicting LoRA Adapters Across Model Families: A Training-Free Approach via Anchor-Space Ridge Regression

07 May 2026 (modified: 09 May 2026)ICML 2026 Workshop CoLoRAI SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LoRA, parameter-efficient fine-tuning, low-rank adapters, cross-model transfer, model merging, ridge regression

TL;DR: Training-free ridge maps LoRAs across model families using a few paired anchors, recovering part of oracle performance. Global ridge/per-tensor PCA work best; nearest-anchor locality fails, and single-domain anchors can break transfer.

Abstract: A LoRA fine-tuned on one base model is, in general, useless on a different base model: weight spaces are unrelated across families. We introduce a training-free predictor that, given a small set of \emph{anchor} task pairs (LoRAs trained on both source and target models), maps a source-side LoRA to a target-side LoRA via ridge regression in the anchor span. The mapping is closed-form, performs no gradient updates on the target model, and recovers a meaningful fraction of the source-to-oracle accuracy gap on held-out tasks. Across four model pairs spanning three scales and three target families (Qwen2.5 to Llama-3.2 at 1B and 3B, Qwen2.5-7B to Llama-3.1-8B, and Qwen2.5 to Gemma-2 at 2B), the best aggregate gap recovered lies in $[0.14,\,0.31]$, and the winning predictor is always either a single global ridge or a per-tensor PCA variant. Two findings stand out. First, per-tensor variants specialize by task family on a per-held-out basis: per-tensor PCA wins science and ties math, while a single global ridge wins code, and this per-task pattern reproduces at 1B, 3B, 8B, and on Pair~C. Second, the mapping is intrinsically non-local in anchor space at 3B: restricting the ridge to nearest-anchor neighborhoods does not help at any neighborhood size, and at 8B the $K$-sweep is flat within noise across $K \in \{2,\dots,24\}$, so locality never reliably outperforms the global ridge. We give a regression-theoretic explanation for both phenomena and report a sharp pool-composition failure mode in which collapsing the anchor pool to a single domain inverts the predictor's sign on cross-domain held-outs.

Submission Number: 85

Loading