PROMOTE: Prior-Guided Diffusion Model with Global-Local Contrastive Learning for Exemplar-Based Image Translation

Published: 01 Jan 2024, Last Modified: 13 Nov 2024ACM Multimedia 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Exemplar-based image translation has garnered significant interest from researchers due to its broad applications in multimedia/multimodal processing. Existing methods primarily employ Euclidean-based losses to implicitly establish cross-domain correspondences between exemplar and conditional images, aiming to produce high-fidelity images. However, these methods often suffer from two challenges: 1) Insufficient excavation of domain-invariant features leads to low-quality cross-domain correspondences, and 2) Inaccurate correspondences result in errors propagated during the translation process due to a lack of reliable prior guidance. To tackle these issues, we propose a novel prior-guided diffusion model with global-local contrastive learning (PROMOTE), which is trained in a self-supervised manner. Technically, global-local contrastive learning is designed to align two cross-domain images within hyperbolic space and reduce the gap between their semantic correlation distributions using the Fisher-Rao metric, allowing the visual encoders to extract domain-invariant features more effectively. Moreover, a prior-guided diffusion model is developed that propagates the structural prior to all timesteps in the diffusion process. It is optimized by a novel prior denoising loss, mathematically derived from the transitions modified by prior information in a self-supervised manner, successfully alleviating the impact of inaccurate correspondences on image translation. Extensive experiments conducted across seven datasets demonstrate that our proposed PROMOTE significantly exceeds state-of-the-art performance in diverse exemplar-based image translation tasks. The source code is publicly available at http://github.com/zgj77/PROMOTE.
Loading