Class-Specific Prompt Learning for Vision-Language Models

Published: 2025, Last Modified: 06 Nov 2025IEEE Trans. Neural Networks Learn. Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The use of learning prompts to adapt pretrained vision–language models (VLMs) for downstream tasks has gained significant attention due to its potential to reduce training costs compared to model fine-tuning through few-shot learning. Most existing methods rely on a universal prompt for all classes, as it generally delivers consistent performance across various datasets. However, a universal prompt cannot capture class-specific discriminative information. To overcome this limitation, we propose class-specific prompt learning (CPL). CPL represents the context of a prompt using two components: a base vector shared among all classes and a class-specific vector designed for individual classes. This method combines the generalization ability of the base context with the adaptability of the class-specific context. Furthermore, we introduce contrastive CPL, which enhances the ability of the prompt to capture discriminative features unique to each class. Also, we adopt the self-consistency loss to regularize the base context, enhancing its generalization ability. As a result, CPL effectively learns tailored prompts for each class. Extensive experiments demonstrate that CPL achieves superior performance over existing methods in both base-class classification and new class generalization.
Loading