Entwined Inversion: Tune-Free Inversion For Real Image Faithful Reconstruction and Editing

Published: 14 Apr 2024, Last Modified: 23 May 2024ICASSP 2024, IEEE International Conference on Acoustics, Speech and Signal ProcessingEveryoneRevisionsCC BY 4.0
Abstract: Text-conditional image editing is a very practical AIGC task that has recently emerged with great commercial and academic research value. For real image editing, most diffusion model-based methods use DDIM Inversion as the first stage before editing, but DDIM Inversion often results in reconstruction failure, leading to unsatisfactory performance for all downstream edits. In order to solve this problem, we first mathematically analyze the reason for the reconstruction failure of DDIM Inversion, and then propose a new inversion and sampling method named Entwined Inversion that can achieve satisfactory reconstruction and editing performance, which can solve two major problems: 1) the object can retain the main content of the original image; 2) the edited object can conform to the semantics of the text prompt. In addition, our method does not require training the diffusion model itself on a large dataset, nor does it require any fine-tuning for some particular images.
Loading