Keywords: Prompt Learning, Test Time Adaption, Vision-Language Models
Abstract: Recent advances in vision-language models (VLMs) have demonstrated significant generalization across a broad range of tasks through prompt learning. However, bridging the distribution shift between training and test data remains a significant challenge. Existing researches utilize multiple augmented views of test samples for zero-shot adaptation. While effective, these approaches focus solely on global visual information, neglecting the local contextual details of test images. Moreover, simplistic, single-form textual descriptions limit the understanding of visual concepts, hindering the transfer performance of classes with similar or complex visual features. In this paper, we propose a Multi-Perspective Test-Time Prompt Tuning method, MP-TPT, building on two key insights: local visual perception and class-specific description augmentation. Specifically, we introduce local visual representations from VLMs during the optimization process to enhance the prompts' ability to perceive local context. On the other hand, we design a data augmentation method at the text feature level that imparts regional visual priors to specific class texts, thereby enriching the class-specific descriptions. Furthermore, we synchronize the multi-view concept during the inference, integrating both local and global visual representations with text features for a deeper understanding of visual concepts. Through extensive experiments across 15 benchmark datasets, we demonstrate the advantages of MP-TPT, particularly achieving a 1% improvement in state-of-the-art TPT accuracy in cross-dataset settings, along with 4.5 times acceleration in inference speed.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7493
Loading