MuRA: Multi-Rank Adaptation for Efficient and Effective Test-Time Vision-Language Generalization

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Test-Time Adaptation, Zero-Shot Generalization, Vision-Language Model
Abstract: Vision-language models (VLMs) have demonstrated remarkable zero-shot capabilities, but their performance degrades significantly when encountering distribution shifts. Recently, test-time adaptation (TTA) methods have been introduced to enhance VLMs' generalization ability. Among these methods, knowledge-adaptive approaches that incorporate Low-Rank Adaptation (LoRA) into vision models show relatively limited improvement compared to other TTA strategies. Our investigation reveals that the fundamental limitation stems from LoRA's static rank configuration, as visual inputs with varying information densities inherently require different ranks for optimal adaptation. To address this challenge, we propose Multi-Rank Adaptation (MuRA), a dynamic rank selection mechanism that adapts to varying data distributions. MuRA achieves state-of-the-art performance on domain generalization and cross-dataset benchmarks. By restricting adaptation to only the deepest layer, MuRA shortens the gradient backpropagation path, thereby significantly reducing both computational and memory overhead. Our method represents an efficient and effective approach to test-time vision-language generalization. Our code will be released as soon as possible.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 6781
Loading