Abstract: The use of learning prompts to adapt pretrained vision–language models (VLMs) for downstream tasks has gained significant attention due to its potential to reduce training costs compared to model fine-tuning through few-shot learning. Most existing methods rely on a universal prompt for all classes, as it generally delivers consistent performance across various datasets. However, a universal prompt cannot capture class-specific discriminative information. To overcome this limitation, we propose class-specific prompt learning (CPL). CPL represents the context of a prompt using two components: a base vector shared among all classes and a class-specific vector designed for individual classes. This method combines the generalization ability of the base context with the adaptability of the class-specific context. Furthermore, we introduce contrastive CPL, which enhances the ability of the prompt to capture discriminative features unique to each class. Also, we adopt the self-consistency loss to regularize the base context, enhancing its generalization ability. As a result, CPL effectively learns tailored prompts for each class. Extensive experiments demonstrate that CPL achieves superior performance over existing methods in both base-class classification and new class generalization.
External IDs:dblp:journals/tnn/LiCWLT25
Loading