MMEditor: Multimodal Prompt-Driven 3D Gaussian Splatting Editing

Published: 2025, Last Modified: 05 Nov 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We propose a multimodal 3D scene editing framework MMEditor to create or modify objects within an extant 3D Gaussian Splatting (3DGS) according to text and image prompts. MMEditor employs a multimodal image editing module to iteratively optimize 3D Gaussians in editing regions for delicate and multi-view consistent 3D editing. The key multimodal image editing module can perform editing with accurate appearance and location control, which is achieved by two designs. First, a multimodel adapter block takes the reference image as a foreign language to augment the text prompt, enabling editing results to align with the generic text description and the unique characteristics in the reference image. Second, an attention-based localization block localizes cross-attention with user-defined 3D bounding boxes, thereby ensuring the editing occurs in editing regions. Experiments demonstrate that our method achieves more accurate and controllable results than previous state-of-the-art methods.
Loading