SpatialComposer: 3D Spatial Object Insertion via Image Gaussian Composition

SpatialComposer: 3D Spatial Object Insertion via Image Gaussian Composition

ICLR 2026 Conference Submission12728 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Model, Image Edit, Image Gaussian, Object Insertion

TL;DR: We propose SpatialComposer, which leverages depth-aware image Gaussians to achieve precise depth-related positional control for object insertion, combined with a training-free object refinement method.

Abstract: With the rapid advancement of open-world image generation models in recent years, a series of image editing tasks have achieved excellent performance. However, considering object insertion as a representative example, this task still presents three primary challenges. First, the inserted object should maintain identity consistency with the reference object while preserving the original scene in non-edited regions. Second, the spatial position and scale of the inserted object should be reasonable and align with user expectations. Third, the inserted object should harmonize with other image components, typically involving object style and surface illumination harmonization. To address these challenges, we propose SpatialComposer, which leverages depth-aware image Gaussians to construct a spatially-structured scene representation from a single scene image and models object insertion as Gaussian composition, thereby achieving effective preservation of scene and object identity while enabling precise control over the scale and 3D spatial position of the inserted object. Subsequently, based on pre-trained diffusion generative models, we introduce a simple yet effective refinement method for the object harmonization process. By designating only the Gaussian components corresponding to the inserted object as trainable parameters, SpatialComposer avoids unintended modifications to other regions while simultaneously addressing both object-scene integration and scene detail preservation. Furthermore, recognizing that current object insertion benchmarks lack consideration for depth-aware position control, we construct a specialized benchmark featuring high-resolution scene images with substantial depth complexity. Comprehensive evaluations demonstrate that SpatialComposer achieves comparable or superior performance over state-of-the-art object insertion approaches across all three aforementioned challenges.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 12728

Loading