Continuous Test-Time Adaptation of Vision-Language Models

Zhaohong Huang; Yuxin Zhang; Pengfei Jiang; Wenjing Liu; Fei Chao; Rongrong Ji

Continuous Test-Time Adaptation of Vision-Language Models

Zhaohong Huang, Yuxin Zhang, Pengfei Jiang, Wenjing Liu, Fei Chao, Rongrong Ji

15 Sept 2025 (modified: 21 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Test Time Adaptation, Vision Language Models

Abstract: Test-time adaptation (TTA) has emerged as a promising paradigm for bridging the distribution gap between pretraining and test data in vision language models (VLMs). Unfortunately, existing methods either assume a static target-domain distribution or rely only on a small subset of samples, which fails to adapt the continuous real-world distributions. In this work, we propose Continuous Test-Time Adaptation (C-TTA), which adapts to the entire target-domain distribution via a continuously updated target prototype that adaptively incorporates visual features from incoming unlabeled test samples based on their class confidence. It is worth highlighting that C-TTA updates only a simple target prototype, which circumvent the heavy backpropagation and large cache access required by previous methods. This endows C-TTA with extremely high efficiency while achieving state-of-the-art performance on 15 image classification benchmarks. For example, C-TTA outperforms all existing training-required methods in cross-dataset generalization, while achieving 5.7\(\times\) faster inference than cache-based TDA on ImageNet. Beyond image classification, C-TTA can be easily applied to 3D VLMs, achieving significant performance gains on 4 challenging point cloud analysis benchmarks.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 5533

Loading