Abstract: Finding an initial noise vector that produces an input
image when fed into the diffusion process (known as inversion) is an important problem in denoising diffusion models (DDMs), with applications for real image editing. The
state-of-the-art approach for real image editing with inversion uses denoising diffusion implicit models (DDIMs [28])
to deterministically noise the image to the intermediate
state along the path that the denoising would follow given
the original conditioning. However, DDIM inversion for
real images is unstable as it relies on local linearization assumptions, which result in the propagation of errors, leading to incorrect image reconstruction and loss of content. To
alleviate these problems, we propose Exact Diffusion Inversion via Coupled Transformations (EDICT), an inversion
method that draws inspiration from affine coupling layers.
EDICT enables mathematically exact inversion of real and
model-generated images by maintaining two coupled noise
vectors which are used to invert each other in an alternating fashion. Using Stable Diffusion [24], a state-of-the-art
latent diffusion model, we demonstrate that EDICT successfully reconstructs real images with high fidelity. On complex
image datasets like MS-COCO, EDICT reconstruction significantly outperforms DDIM, improving the mean square
error of reconstruction by a factor of two. Using noise
vectors inverted from real images, EDICT enables a wide
range of image edits—from local and global semantic edits to image stylization—while maintaining fidelity to the
original image structure. EDICT requires no model training/finetuning, prompt tuning, or extra data and can be
combined with any pretrained DDM. Code is available at
https://github.com/salesforce/EDICT.
0 Replies
Loading