Keywords: Image fusion, general image fusion, multimodal image fusion, latent diffusion models
TL;DR: We propose a robust and general image fusion framework that requires no additional training, while effectively adapting to diverse fusion scenarios and effectively addressing various forms of interference.
Abstract: This study proposes PDFuse, a robust, general training-free image fusion framework built on pre-trained latent diffusion models with projection–manifold regularization. By redefining fusion as a diffusion inference process constrained by multiple source images, PDFuse can adapt to varied image modalities and produce high-fidelity outputs utilizing the diffusion prior. To ensure both source consistency and full utilization of generative priors, we develop novel projection–manifold regularization, which consists of two core mechanisms. On the one hand, the Multi-source Information Consistency Projection (MICP) establishes a projection system between diffusion latent representations and source images, solved efficiently via conjugate gradients to inject multi-source information into the inference. On the other hand, the Latent Manifold-preservation Guidance (LMG) aligns the latent distribution of diffusion variables with that of the sources, guiding generation to respect the model’s manifold prior. By alternating these mechanisms, PDFuse strikes an optimal balance between fidelity and generative quality, achieving superior fusion performance across diverse tasks. Moreover, PDFuse constructs a canonical interference operator set. It synergistically incorporates it into the aforementioned dual mechanisms, effectively leveraging generative priors to address various degradation issues during the fusion process without requiring clean data for supervising training. Extensive experimental evidence substantiates that PDFuse achieves highly competitive performance across diverse image fusion tasks. The code is publicly available at https://github.com/Leiii-Cao/PDFuse.
Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)
Submission Number: 21758
Loading