DOTA: Distributional Test-Time Adaptation of Vision-Language Models

Zongbo Han; Jialong Yang; Junfan Li; Qinghua Hu; Qianli Xu; Mike Zheng Shou; Changqing Zhang

DOTA: Distributional Test-Time Adaptation of Vision-Language Models

Zongbo Han, Jialong Yang, Junfan Li, Qinghua Hu, Qianli Xu, Mike Zheng Shou, Changqing Zhang

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Test-time, uncertainty, vision-language models

TL;DR: We proposes a new Distributional Test-time Adaptation (DOTA) method, which continuously estimates the distribution of test samples and incorporates human-machine collaboration to handle uncertain samples.

Abstract: Vision-language foundation models (e.g., CLIP) have shown remarkable performance across a wide range of tasks. However, deploying these models may be unreliable when significant distribution gaps exist between the training and test data. The training-free test-time dynamic adapter (TDA) is a promising approach to address this issue by storing representative test samples to guide the classification of subsequent ones. However, TDA only naively maintains a limited number of reference samples in the cache, leading to severe test-time catastrophic forgetting when the cache is updated by dropping samples. In this paper, we propose a simple yet effective method for DistributiOnal Test-time Adaptation (DOTA). Instead of naively memorizing representative test samples, DOTA continually estimates the distributions of test samples, allowing the model to continually adapt to the deployment environment. The test-time posterior probabilities are then computed using the estimated distributions based on Bayes' theorem for adaptation purposes. To further enhance the adaptability on the uncertain samples, we introduce a new human-machine collaboration paradigm which identifies uncertain samples, collects human-feedback, and incorporates it into the DOTA framework. Extensive experiments validate that DOTA enables CLIP to continually learn, resulting in a significant improvement compared to current state-of-the-art methods.

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10632

Loading