Making Text-to-Image Diffusion Models Zero-Shot Image-to-Image Editors by Inferring "Random Seeds"

Chen Henry Wu; Fernando De la Torre

Making Text-to-Image Diffusion Models Zero-Shot Image-to-Image Editors by Inferring "Random Seeds"

Chen Henry Wu, Fernando De la Torre

Published: 29 Nov 2022, Last Modified: 05 May 2023SBM 2022 PosterReaders: Everyone

Keywords: diffusion models, text-to-image synthesis, image editing

TL;DR: CycleDiffusion allows stochastic diffusion models to do text-guided real image editing

Abstract: Recent text-to-image diffusion models trained on large-scale data achieve remarkable performance on text-conditioned image synthesis (e.g., GLIDE, DALL∙E 2, Imagen, Stable Diffusion). This paper introduces a simple method to use stochastic text-to-image diffusion models as zero-shot image editors. Our method, CycleDiffusion, is based on the finding that when all random variables (or ``random seed'') are fixed, two similar text prompts will produce similar images. The core of our idea is to infer the random variables that are likely to generate a source image conditioned on a source text. With the inferred random variables, the text-to-image diffusion model then generates a target image conditioned a target text. Our experiments show that CycleDiffusion outperforms SDEdit and the ODE-based DDIB method, and it can be further improved by Cross Attention Control. Demo: https://huggingface.co/spaces/ChenWu98/Stable-CycleDiffusion.

Student Paper: Yes

1 Reply

Loading