ADAPT: Adaptive Prompt Tuning for Pre-Trained Vision-Language Models

TMLR Paper6012 Authors

26 Sept 2025 (modified: 28 Nov 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Prompt tuning has emerged as an effective way for parameter-efficient fine-tuning. Conventional deep prompt tuning inserts continuous prompts of a fixed context length into the input to each layer. When a pre-trained model is tailored to a specific downstream task, different layers initialized with pre-trained weights might have different levels of deviation from the optimal weights. Inserted prompts with a fixed context length might have redundant context tokens or insufficient context length. To address this issue, we propose a deep continuous prompting method dubbed Adapt that encourages heterogeneous context lengths. In this method, context lengths are automatically determined by iteratively pruning context tokens. We use the saliency criterion for neural network pruning to compute the importance scores of context tokens in order to determine which tokens to prune. To avoid the forgetting issue in the fine-tuning process, we apply the angular knowledge distillation to force the model to learn the angular separation between pairs of classes and that of instances from the pre-trained model. We examine the proposed method on the pre-trained vision-language model CLIP. Extensive experiments on 11 downstream datasets reveal the advantage of Adapt: the average test accuracy achieves the new state of the art, and the highest performance gain on individual datasets is 7.44%. We release the code in https://anonymous.4open.science/r/Adapt-Prompt-Release.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Yingnian_Wu1
Submission Number: 6012
Loading