Keywords: Test Time Adaptation, Vision Language Models
Abstract: Test-time adaptation (TTA) has emerged as a promising paradigm for bridging the distribution gap between pretraining and test data in vision language models (VLMs). Unfortunately, existing methods either assume a static target-domain distribution or rely only on a small subset of samples, which fails to adapt the continuous real-world distributions. In this work, we propose Continuous Test-Time Adaptation (C-TTA), which adapts to the entire target-domain distribution via a continuously updated target prototype that adaptively incorporates visual features from incoming unlabeled test samples based on their class confidence. It is worth highlighting that C-TTA updates only a simple target prototype, which circumvent the heavy backpropagation and large cache access required by previous methods. This endows C-TTA with extremely high efficiency while achieving state-of-the-art performance on 15 image classification benchmarks. For example, C-TTA outperforms all existing training-required methods in cross-dataset generalization, while achieving 5.7\(\times\) faster inference than cache-based TDA on ImageNet. Beyond image classification, C-TTA can be easily applied to 3D VLMs, achieving significant performance gains on 4 challenging point cloud analysis benchmarks.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 5533
Loading