Multi-Vision Multi-Prompt for Few-Shot Learning in Vision-Language Model

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: transfer learning, meta learning, and lifelong learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Few-shot Learning, Vision-Language Models, Transfer Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: The paper introduces the Multi-Vision Multi-Prompt (MVMP), a novel method for the CLIP model that enhances few-shot learning by utilizing multiple prompts and image augmentation, outperforming the state-of-the-art by improving accuracy by 2 to 4.6%.
Abstract: In vision-language models such as the Contrastive Language-Image Pre-Training model (CLIP), prompt learning can efficiently and rapidly adapt to specific tasks in few-shot learning. Previous methods for prompt learning often rely on a single prompt. However, a single prompt may not accurately distinguish between different categories, especially when a category has multiple features and contextual connections in a few-shot learning environment. While the performance of few-shot learning can improve through meta-learning or image augmentation strategies, these approaches may increase computational cost and affect accuracy. To address these issues, we propose a new method called Multi-Vision Multi-Prompt (MVMP), designed for CLIP in a few-shot learning environment. Instead of increasing the number of model parameters, MVMP employs multiple prompts at different stages of the training process and averages the predictions. Additionally, we present a mixed self-augmentation framework and text distillation to further enhance the model's performance. Extensive experimental validation demonstrates that our approach significantly outperforms the state-of-the-art in the few-shot learning classification tasks, improving accuracy by 4.6% and 2%.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1541
Loading