Conditional Image Synthesis with Diffusion Models: A Survey

TMLR Paper3658 Authors

10 Nov 2024 (modified: 28 Nov 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Conditional image synthesis based on user-specified requirements is a key component in creating complex visual content. In recent years, diffusion-based generative modeling has become a highly effective way for conditional image synthesis, leading to exponential growth in the literature. However, the complexity of diffusion-based modeling, the wide range of image synthesis tasks, and the diversity of conditioning mechanisms present significant challenges for researchers to keep up with rapid developments and understand the core concepts on this topic. In this survey, we categorize existing works based on how conditions are integrated into the two fundamental components of diffusion-based modeling, i.e., the denoising network and the sampling process. We specifically highlight the underlying principles, advantages, and potential challenges of various conditioning approaches in the training, re-purposing, and specialization stages to construct a desired denoising network. We also summarize six mainstream conditioning mechanisms in the essential sampling process. All discussions are centered around popular applications. Finally, we pinpoint some critical yet still open problems to be solved in the future and suggest some possible solutions.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Robert_Legenstein1
Submission Number: 3658
Loading