Keywords: Image Fusion; Prompt Learning; Graph Network; Memory Bank
Abstract: Infrared and visible image fusion aims to integrate complementary information from different modalities into a unified representation. However, existing methods lack the capability to leverage historical fusion experiences and generate modality-specific semantic guidance, thereby limiting their adaptability and fusion quality. To address these challenges, this study proposes a novel Memory-Orchestrated Multi-Prompt Learning network that transforms fusion from a static feature combination process into a dynamic prompt-guided learning paradigm. Our method encompasses two core mechanisms: 1) Memory-driven experiential prompts that capture and reuse successful fusion patterns from historical cases through a CLIP-evaluated dynamic memory bank; 2) Graph-driven modality-specific prompts that model cross-modal semantic relationships via specialized semantic graph networks to generate targeted guidance for each modality. These dual prompts are jointly modulated across multiple scales and progressively integrated into the fusion process, enabling stable, interpretable, and transferable guidance for fusion decisions without relying on strong supervision. Furthermore, we exploit residual priors to assess the salient complementarity of source features, thereby constraining the solution space and enhancing the model's effective perception of complementary characteristics. Extensive experiments, including both statistical metrics and performance on high-level vision tasks, demonstrate the effectiveness of the proposed method.
Primary Area: applications to computer vision, audio, language, and other modalities
Supplementary Material: zip
Submission Number: 2252
Loading