MAPLE: Masked Adapter Prototype Learning for OOD generalization

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Mixture of LoRAs, OOD generalization, Large Language Models
TL;DR: This paper proposes a masked-adapter prototype learning scheme that fixes unreliable routing among multiple pretrained LoRAs by learning complementary, denoised prototypes, boosting OOD generalization.
Abstract: Parameter-efficient fine-tuning with adapters (e.g., LoRA) equips LLMs with task-specific skills. However, utilizing multiple pretrained adapters for out-of-distribution (OOD) generalization remains challenging. Existing techniques for OOD generalization using multiple pretrained LoRAs, route inputs using LoRA representations (prototypes) obtained independently, assuming these representations capture complementary information. However, we observe that for existing methods, in-distribution and OOD routing entropies are often comparable, thus bringing the complementarity assumption into question. We derive the theoretical conditions that could lead to a violation of such assumptions, distilling the cause down to the presence of shared, noisy prototype subspaces. Based on this, we introduce $\textbf{MAPLE (Masked-Adapter Prototype LEarning)}$, a simple learning framework that refines LoRA prototypes by masking the target task’s LoRA during prototype learning. In doing so, it encourages prototypes to discard noisy attributes, which improves routing and strengthens OOD generalization. Extensive experiments on language models of varying size, such as Phi-2 (2.7B) and LLaMA-3 (8B) equipped with heterogeneous pools of pretrained LoRAs, show that MAPLE improves the LoRA representation and thus achieves state-of-the-art performance across multiple benchmarks.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 9510
Loading