Keywords: Image Inpainting, Seamless Blend, Stable Diffusion, VAE
Abstract: Image inpainting aims to fill missing or masked regions of an image in a manner that blends with the surrounding context. While diffusion models have significantly improved the visual fidelity of inpainting, they still suffer from noticeable stitched seams, including \textbf{boundary discontinuity} and \textbf{content inconsistency} between the preserved and generated regions. We argue that these issues originate from a fundamental limitation: the latent blending of the two regions in inference, which unaccounted for in training, creates a piece-wise latent manifold. Firstly, the masked input encoded by VAE does not perfectly align with the resized mask, resulting in boundary discontinuity and the discontinuity will maintain in the reconstruction and denoising process. Second, the piece-wise latent manifold deviates from the assumption of data coherence in diffusion models since the two regions follow distinct distributions, leading to content inconsistency. In this work, we propose \textbf{Blend-Aware Latent Diffusion}, a unified framework that explicitly resolves these issues by aligning the model's training dynamics with the blend nature of inference. Our framework consists of two complementary components: \textbf{BlendRecon}, a blend-aware variational autoencoder that learns to decode blended latents continuously; and \textbf{BlendGen}, a novel denoising loss that explicitly regularizes the generated content to harmonize with the surrounding context. Extensive experiments demonstrate that Blend-Aware Latent Diffusion effectively mitigates stitched seams and improves perceptual quality across various scenarios, including inpainting and outpainting.
Supplementary Material: pdf
Primary Area: generative models
Submission Number: 10654
Loading