Empowering Test-Time Adaptation with Complementary Vision-Language Knowledge in Open-World Scenarios

20 Sept 2025 (modified: 10 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Test-Time Adaptation, Vision-Language Model, Open-World Test-Time Adaptation
TL;DR: A vision–language empowered framework leveraging transferable VLM knowledge to enhance out-of-distribution filtering and semantics-boosted test-time adaptation of discriminative models for robust adaptation in open-world scenarios.
Abstract: Test-time adaptation in open-world scenarios (OWTTA), which addresses both domain discrepancy and semantic variance, has gained increasing attention for enabling models to adapt dynamically during inference. Existing approaches mainly rely on discriminative models, whose over-specialized knowledge restricts their adaptability in open-world settings. In contrast, vision-language models (VLMs), trained on diverse large-scale data, provide broader and more transferable knowledge, yet their role in OWTTA remains underexplored. In this work, we propose a framework empowered by vision-language models, termed Vision-Language knowledge Boosted Open-world test-time adaptation (VLBO). Specifically, by casting OWTTA into a probabilistic perspective, we first propose agreement-boosted filtering (AF), in which the discriminative model assumes the primary role of filtering out out-of-distribution samples, while the VLM provides a reinforcing signal to refine this process based on its agreement with the discriminative model. We then introduce semantics-boosted adaptation (SA), where VLM-extracted representations serve as semantic guidance to enhance the discriminative model’s adaptation to target domains. This unified framework leverages the complementary strengths of vision-language models and discriminative counterparts, enabling robust and effective adaptation in open-world scenarios. Extensive experiments across multiple benchmarks demonstrate the consistent effectiveness of the proposed method.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 24358
Loading