Abstract: This paper provides an efficient training-free painterly image harmonization (PIH) method, dubbed FreePIH, that leverages only a
pre-trained diffusion model to achieve state-of-the-art harmonization results. Unlike existing methods that require either training
auxiliary networks or fine-tuning a large pre-trained backbone, or both, to harmonize a foreground object with a painterly-style
background image, our FreePIH tames the denoising process as a plug-in module for foreground image style transfer. Specifically, we
find that the very last few steps of the denoising (i.e., generation) process strongly correspond to the stylistic information of images,
and based on this, we propose to augment the latent features of both the foreground and background images with Gaussians for
a direct denoising-based harmonization. To guarantee the fidelity of the harmonized image, we make use of multi-scale features to
enforce the consistency of the content and stability of the foreground objects in the latent space, and meanwhile, aligning both
fore-/back-grounds with the same style. Moreover, to accommodate the generation with more structural and textural details, we fur-
ther integrate text prompts to attend to the latent features, hence improving the generation quality. Quantitative and qualitative eval-
uations on COCO and LAION 5B datasets demonstrate that our method can surpass representative baselines by large margins.
Relevance To Conference: This paper provides an efficient training-free painterly image harmonization (PIH) method, that leverages only a pre-trained diffusion model to achieve state-of-the-art harmonization results. With the composition capacity of FreePIH, users gain enhanced autonomy in shaping their AIGC artworks when using DM-based genetative model.
Supplementary Material: zip
Primary Subject Area: [Generation] Generative Multimedia
Submission Number: 1894
Loading