Efficient Test-Time Prompt Tuning for Vision-Language Models

21 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision-Language Models, Zero-shot Generalization, Prompt Learning, Test-Time Adaptation
TL;DR: An efficient test-time prompt adaptation method using exclusive language-branch self-supervision.
Abstract: Vision-language models have showcased impressive zero-shot classification capabilities when equipped with suitable text prompts. Previous studies have shown the effectiveness of test-time prompt tuning; however, these methods often require per-image prompt adaptation during inference, which is computationally intensive and limits scalability and deployment. To address this issue, we introduce a novel framework: Self-supervised learning for efficient Test-time Prompt Tuning (Self-TPT). The key feature of Self-TPT is its shift to efficient \textit{predefined class adaptation} through self-supervised learning, thereby avoiding the computation-heavy \textit{per-image adaptation} at inference. Self-TPT starts by co-training the self-supervised and supervised tasks using source data, then applies the self-supervision exclusively for new class understanding before making predictions. Specifically, we propose Contrastive Prompt Learning (CPT) as the core task for self-supervision. CPT is designed to minimize the intra-class distances while enhancing inter-class distinguishability via contrastive learning. Empirical evidence suggests that CPT can partially mimic supervised learning in terms of gradients, providing a plausible explanation for its effectiveness. Motivated by this finding, we introduce a gradient matching loss to explicitly enhance gradient similarity. We evaluated Self-TPT across three challenging zero-shot benchmarks. The results consistently show that Self-TPT not only significantly reduces inference costs but also achieves state-of-the-art performance, effectively balancing the efficiency-efficacy trade-off.
Primary Area: transfer learning, meta learning, and lifelong learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2299
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview