Cross-coupled prompt learning for few-shot image recognition

Published: 2024, Last Modified: 15 Jan 2026Displays 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•To the best of our knowledge, CCPG is the first to achieve cross-modal bidirectional interaction between visual and textual prompts.•CCPG also reinforces cross-modal feature fusion between image and text embeddings, enabling stronger mutual exchange of informative representations.•To achieve cross-modal interaction, We design CCPG module for visual & textual prompts to capture key information via cross-attention mechanism.•To reinforce cross-modal feature fusion, we design a CMF module to enhance semantic consistency between image and text via ITM loss function.•Extensive experiments show CCPL surpasses single-/multi-modal prompt learning in various few-shot image recognition tasks.
Loading