Enhancing Vision-Language Prompt Learning through Image-Text Distribution Alignment

19 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Domain adaptaion, CLIP, Prompt learning
Abstract: Large vision-language models (VLMs) such as CLIP have demonstrated impressive performance in zero-shot image classification tasks. These models usually leverage prompts to align the text and image distributions. However, existing prompting techniques have limitations in terms of interpretability or dynamic alignment of distributions. Specifically, the discrete prompt learning methods cannot effectively perform dynamic alignment of distributions, while the soft prompt learning method have very limited interpretability, rendering them challenging to comprehend and enhance. To jointly solve these issues, we leverage the interpretable descriptions to facilitate the soft prompt learning. In this paper, we introduce a novel training-free strategy to mitigate the distribution gap between plain text and image-text corpus, leveraging the power of pretrained models like GPT-3 to enhance image classification performance. Furthermore, we propose a new few-shot learning pipeline that incorporates a prompt learning and reweighting strategy to dynamically mitigate the image and text distribution gap. This method overcomes the limitations of existing prompting techniques and offers a more effective and interpretable solution for image classification tasks. Extensive experiments show the effectiveness of our method and illustrate the interpretability of our descriptions.
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1939
Loading