PGMPL: Prototype-Guided Multi-modal Prompt Learning for Vision-Language Models

Yunna Lv; Xiaodan Li; YueFeng Chen; Dengpan Ye; Xie Caiyun; Yueyun Shang; Zhihong Tian

PGMPL: Prototype-Guided Multi-modal Prompt Learning for Vision-Language Models

Yunna Lv, Xiaodan Li, YueFeng Chen, Dengpan Ye, Xie Caiyun, Yueyun Shang, Zhihong Tian

17 Sept 2025 (modified: 29 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Prompt Learning, Vision-Language Models, Transfer Learning

TL;DR: Prototype-Guided Multi-modal Prompt Learning for Vision-Language Models

Abstract: Vision-language models (VLMs) have been widely applied to various visual tasks due to their strong zero-shot transfer capabilities. However, their performance on downstream tasks often remains suboptimal. While fine-tuning can improve accuracy on base classes, it often compromises generalization to novel classes. To address this challenge, we propose the Prototype-Guided Multi-modal Prompt Learning (PGMPL), which guides representation learning through a supervisory signal with intra-class summary information. Specifically, we construct a category-level prototype for each class by aggregating multi-image features with textual semantics. This prototype serves as a cross-modal, summarizing supervisory signal, strengthening image-text alignment and enhancing the generalization of the learned representations. To further optimize prototype and its guidance of representation learning, we refine multi-modal representations via prompt learning and introduce bidirectional cross-attention to alleviate the image-text matching inconsistency induced by newly inserted prompts. Extensive experiments demonstrate the effectiveness of PGMPL, which achieves a higher overall harmonic mean than state-of-the-art methods in zero-shot tasks across 11 datasets. Our code is available at https://anonymous.4open.science/r/PGMPL.

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 8732

Loading