Efficient Open-set Test Time Adaptation of Vision Language Models

Manogna Sreenivas; Soma Biswas

Efficient Open-set Test Time Adaptation of Vision Language Models

Manogna Sreenivas, Soma Biswas

Published: 05 Mar 2025, Last Modified: 14 Apr 2025SCOPE - ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Main paper track (up to 5 pages excluding references and appendix)

Keywords: Vision Language Models, Test Time Adaptation, Open set recognition

TL;DR: We propose ROSITA, a framework to address Open-set Test Time Adaptation to equip Vision Language Models with the ability to say "I don't know" when presented with an unseen class sample.

Abstract: In dynamic real-world settings, models must adapt to changing data distributions, a challenge known as Test Time Adaptation (TTA). This becomes even more challenging in scenarios where test samples arrive sequentially, and the model must handle open-set conditions by distinguishing between known and unknown classes. Towards this goal, we propose ROSITA, a novel framework for Open set Single Image Test Time Adaptation using Vision-Language Models (VLMs). To enable the separation of known and unknown classes, ROSITA employs a specific contrastive loss, termed ReDUCe loss, which leverages feature banks storing reliable test samples. This approach facilitates efficient adaptation of known class samples to domain shifts while equipping the model to accurately reject unfamiliar samples. Our method sets a new benchmark for this problem, validated through extensive experiments across diverse real-world test environments.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 83

Loading