LoRA-TTT: Low-Rank Test-Time Training for Vision-Language Models

Yuto Kojima; Jiarui Xu; Xueyan Zou; Xiaolong Wang

LoRA-TTT: Low-Rank Test-Time Training for Vision-Language Models

Yuto Kojima, Jiarui Xu, Xueyan Zou, Xiaolong Wang

Published: 10 Jun 2025, Last Modified: 11 Jul 2025PUT at ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Test-Time Training, Vision-Language Models, Low-Rank Adaptation

Abstract: We propose LoRA-TTT, a novel test-time training (TTT) method for vision-language models (VLMs) that leverages Low-Rank Adaptation (LoRA), applied exclusively to the image encoder. Unlike prior TTT approaches that rely on computationally intensive text prompt tuning and entropy-based loss, LoRA-TTT updates only LoRA parameters at test time, achieving substantial performance gains with minimal memory and runtime overhead. We also introduce an efficient reconstruction loss tailored for TTT. Experiments on 15 datasets show that LoRA-TTT improves zero-shot top-1 accuracy of CLIP-ViT-B/16 by 5.79\% on OOD and 1.36\% on fine-grained benchmarks, without using external models or caches.

Supplementary Material: pdf

Submission Number: 60

Loading