Abstract: Pre-trained Visual-Language Models (VLMs) have demonstrated powerful performance on various downstream tasks. Recently, many prompt tuning methods represented by Context Optimization (CoOp) have effectively adapted VLMs to few-shot tasks. However, the CoOp-based methods suffer from overfitting to base classes, which impairs the model’s generalization to new classes. Considering that meta-learning excels at generalizing to new classes, we combine meta-learning with CoOp-like vision-language model fine-tuning methods to improve performance on few-shot generation tasks. In this paper, we present a novel Meta-learning-based Multi-Textual Prompt tuning (MMTP) method, which learns multiple textual prompts leveraging meta-learning to enhance the visual-language model’s representation and generalization capabilities. Specifically, we introduce multi-textual prompts to enhance the representation of the model for improving the recognition of base classes. Simultaneously, we employ meta-learning to optimize prompt training, bolstering the model’s generalization to new classes. Extensive experiments demonstrate the superiority of our method under base-to-new generalization and cross-domain generalization settings. Furthermore, we also conduct ablation studies to validate the effectiveness of each component.
Loading