Abstract: Prompt learning is a cutting-edge parameter-efficient fine-tuning technique for pre-trained vision-language models (VLMs). Instead of learning a single text prompt, recent works have revealed that learning diverse text prompts can effectively boost the performances on downstream tasks, as the diverse prompted text features can comprehensively depict the visual concepts from different perspectives. However, diverse prompt learning demands enormous computational resources. This efficiency issue still remains unexplored. To achieve efficient and diverse prompt learning, this paper proposes a novel \textbf{Surrogate Prompt Learning (SurPL)} framework. Instead of learning diverse text prompts, SurPL directly generates the desired prompted text features via a lightweight \textbf{Surrogate Feature Generator (SFG)}, thereby avoiding the complex gradient computation procedure of conventional diverse prompt learning. Concretely, based on a basic prompted text feature, SFG can directly and efficiently generate diverse prompted features according to different pre-defined conditional signals. Extensive experiments indicate the effectiveness of the surrogate prompted text features, and show compelling performances and efficiency of SurPL on various benchmarks.
Lay Summary: We introduce Surrogate Prompt Learning (SurPL), a new method that makes adapting large vision–language models much more efficient. Traditional prompt learning requires optimizing many separate text prompts, which takes an expensive process in both time and compute. Instead of fine-tuning multiple prompts, SurPL uses a lightweight Surrogate Feature Generator to directly produce diverse prompted embeddings on demand, cutting out most of the heavy computation. This generator learns to take a single “base” prompt embedding and, guided by simple control signals, output many variations instantly. By shifting the work from gradient-based tuning to a small auxiliary network, SurPL achieves comparable performance on standard vision benchmarks while using far fewer resources.
Link To Code: https://github.com/llcllc1997/SurPL
Primary Area: General Machine Learning->Transfer, Multitask and Meta-learning
Keywords: transfer learning, prompt learning, vision-language model, parameter-efficient fine-tuning
Submission Number: 11408
Loading