Mixture of coarse and fine-grained prompt tuning for vision-language model

Yansheng Gao, Zixi Zhu, Sheng-Sheng Wang

Published: 2026, Last Modified: 26 Jul 2025Pattern Recognit. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We propose a simple yet effective method Mixture of Coarse and Fine-grained Prompt Tuning (MCFPT)to enhance the generalization and discriminative capabilities of Visual Language Models (VLMs) by leveraging the strength of fine-grained text prompt and coarse text prompt.•We propose Mixed Fusion Module, which introduce the Mixture-of-Expert (MoE) mechanism to fuse and select the coarse domain-shared and fine-grained categorydiscriminative text features to get mixed text feature.•We proposed Dynamic Refinement Adapter (DRA), which introduces the mixed text feature to refine the original text feature to keep the consistency between mixed feature and adjust the category distribution.•Extensive experiments show that MCFPT achieves competitive performance in base-tonew, few-shot classification, domain generalization, and cross-domain classification tasks.