Multitask Image-to-Image Diffusion Models with Fine-Grained Control

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: generative models, computer vision, machine learning, image-to-image translation, diffusion models, multitask image editing
Abstract: Diffusion models have recently been applied to various image restoration and editing tasks, showing remarkable results in commercial products, e.g., Adobe Photoshop. While recent approaches to text-based editing have shown flexibility and great editing capacity, they still lack fine-grained control and/or multi-task compositing capabilities. In everyday applications, however, having a single tool for image editing with detailed user control across multiple tasks is highly preferred. This paper proposes a multi-task image-to-image diffusion model that allows fine- grained image editing among multiple tasks within a single model. Our approach builds upon conditional diffusion models and jointly models the input images and the input compositing effects, including motion blur, film grain, colorization, image sharpening, and inpainting. We present a novel input conditioning formulation and observe that using explicit binary task activation labels and cross-attention-based feature conditioning are key to allowing the model to achieve multi-task editing. In addition, we introduce a novel benchmark dataset for image compositing effects with standard image metrics for advancing the state of the art. Our approach can manipulate natural images with fine-grained, disentangled user control on single- and multi-task editing setups and generalizes well across different domains and even to unseen data distributions. We present experimental results on various compositing tasks to show that our approach outperforms existing techniques and baselines.
Supplementary Material: pdf
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2847
Loading