Diffusion Compose: Compositional Depth Aware Scene Editing in Diffusion Models

Rishubh Parihar; Sachidanand VS; Sabariswaran M; Venkatesh Babu Radhakrishnan

Diffusion Compose: Compositional Depth Aware Scene Editing in Diffusion Models

Rishubh Parihar, Sachidanand VS, Sabariswaran M, Venkatesh Babu Radhakrishnan

20 Sept 2024 (modified: 14 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Image editing, 3D-aware scene control, Text-to-Image Diffusion Models

TL;DR: We propose Diffusion Compose, a zero-shot approach to perform depth-aware scene editing using Text-to-Image diffusion models.

Abstract: We introduce Diffusion Compose, a zero-shot approach for depth-aware scene editing using Text-to-Image diffusion models. While existing methods for 3D-aware editing focus on object-centric control, they do not support compositional depth-aware edits, such as placing objects at specific depths or combining multiple scenes realistically. We address this by incorporating depth-based multiplane scene representation in diffusion models. These planes, placed at fixed depths, can be individually edited or composed to enable 3D-aware scene modifications. However, direct manipulation of multiplane representation of diffusion latents often leads to identity loss or unrealistic blending. To overcome this, we propose a novel multiplane feature guidance technique that gradually aligns source latents with the target edit at each denoising step. We validate Diffusion Compose on two challenging tasks: a) scene composition, blending scenes with consistent depth order and scene illumination, and b) depth-aware object insertion, inserting novel objects at specified depths in a scene while preserving occlusions and scene structure and illumination. Extensive experiments demonstrate that Diffusion Compose significantly outperforms task-specific baselines for object placement and harmonization. A user study further confirms that it produces realistic, identity-preserving, and accurate depth-aware scene edits.

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2157

Loading