LoRA-TTT: Low-Rank Test-Time Training for Vision-Language Models

Published: 10 Jun 2025, Last Modified: 11 Jul 2025PUT at ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Test-Time Training, Vision-Language Models, Low-Rank Adaptation
Abstract: We propose LoRA-TTT, a novel test-time training (TTT) method for vision-language models (VLMs) that leverages Low-Rank Adaptation (LoRA), applied exclusively to the image encoder. Unlike prior TTT approaches that rely on computationally intensive text prompt tuning and entropy-based loss, LoRA-TTT updates only LoRA parameters at test time, achieving substantial performance gains with minimal memory and runtime overhead. We also introduce an efficient reconstruction loss tailored for TTT. Experiments on 15 datasets show that LoRA-TTT improves zero-shot top-1 accuracy of CLIP-ViT-B/16 by 5.79\% on OOD and 1.36\% on fine-grained benchmarks, without using external models or caches.
Supplementary Material: pdf
Submission Number: 60
Loading