MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path

Qian Wang; Biao Zhang; Michael Birsak; Peter Wonka

MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path

Qian Wang, Biao Zhang, Michael Birsak, Peter Wonka

Published: 17 Sept 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Image generation using diffusion can be controlled in multiple ways. In this paper, we systematically analyze the equations of modern generative diffusion networks to propose a framework, called MDP, that explains the design space of suitable manipulations. We identify 5 different manipulations, including intermediate latent, conditional embedding, cross attention maps, guidance, and predicted noise. We analyze the corresponding parameters of these manipulations and the manipulation schedule. We show that some previous editing methods fit nicely into our framework. Particularly, we identified one specific configuration as a new type of control by manipulating the predicted noise, which can perform higher-quality edits than previous work for a variety of local and global edits.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: - Added a broader impact section in the main paper. - Expanded the discussion of the quantitative results in the main paper. - Fixed inconsistencies in Figure 3 in the main paper.

Code: https://github.com/QianWangX/MDP-Diffusion

Supplementary Material: zip

Assigned Action Editor: ~Shiyu_Chang2

Submission Number: 2666

Loading