Surrogate Prompt Learning: Towards Efficient and Diverse Prompt Learning for Vision-Language Models

Liangchen Liu; Nannan Wang; Xi Yang; Xinbo Gao; Tongliang Liu

Surrogate Prompt Learning: Towards Efficient and Diverse Prompt Learning for Vision-Language Models

Liangchen Liu, Nannan Wang, Xi Yang, Xinbo Gao, Tongliang Liu

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Prompt learning is a cutting-edge parameter-efficient fine-tuning technique for pre-trained vision-language models (VLMs). Instead of learning a single text prompt, recent works have revealed that learning diverse text prompts can effectively boost the performances on downstream tasks, as the diverse prompted text features can comprehensively depict the visual concepts from different perspectives. However, diverse prompt learning demands enormous computational resources. This efficiency issue still remains unexplored. To achieve efficient and diverse prompt learning, this paper proposes a novel \textbf{Surrogate Prompt Learning (SurPL)} framework. Instead of learning diverse text prompts, SurPL directly generates the desired prompted text features via a lightweight \textbf{Surrogate Feature Generator (SFG)}, thereby avoiding the complex gradient computation procedure of conventional diverse prompt learning. Concretely, based on a basic prompted text feature, SFG can directly and efficiently generate diverse prompted features according to different pre-defined conditional signals. Extensive experiments indicate the effectiveness of the surrogate prompted text features, and show compelling performances and efficiency of SurPL on various benchmarks.

Lay Summary: We introduce Surrogate Prompt Learning (SurPL), a new method that makes adapting large vision–language models much more efficient. Traditional prompt learning requires optimizing many separate text prompts, which takes an expensive process in both time and compute. Instead of fine-tuning multiple prompts, SurPL uses a lightweight Surrogate Feature Generator to directly produce diverse prompted embeddings on demand, cutting out most of the heavy computation. This generator learns to take a single “base” prompt embedding and, guided by simple control signals, output many variations instantly. By shifting the work from gradient-based tuning to a small auxiliary network, SurPL achieves comparable performance on standard vision benchmarks while using far fewer resources.

Link To Code: https://github.com/llcllc1997/SurPL

Primary Area: General Machine Learning->Transfer, Multitask and Meta-learning

Keywords: transfer learning, prompt learning, vision-language model, parameter-efficient fine-tuning

Submission Number: 11408

Loading