\begin{abstract}

The rapid advancements in 3D visual generative AI are driven by improvements in the quality and realism of 2D generative models, alongside recent developments in efficient 3D reconstruction techniques. In this work, we address the problem of 3D editing by developing a consistent multi-view 2D editing model and leveraging 3D reconstruction methods to obtain a 3D representation. Our approach generalizes across various inputs, including renderings of digital 3D assets and turntable videos of real-world objects. Furthermore, this generalization enables our method to be applied as a post-processing step to any existing 3D generative approach, regardless of the underlying geometry representation model.

We introduce \we{}, a model that integrates 2D generation with 3D reconstruction to facilitate 3D editing. A key component of our approach is MV Instruct Pix2Pix XL, a modified version of Instruct Pix2Pix \cite{brooks2023instructpix2pix}, designed to generate consistent multi-view images of the same object using the Stable Diffusion XL \cite{podell2023sdxl} image generation model. To ensure coherence across multiple views, we employ a novel interpolation mechanism that enables single-inference processing for consistent editing across multiple images. Additionally, we enhance output fidelity by incorporating a super-resolution upscaling step. The geometry of the asset is estimated using a state-of-the-art 3D Gaussian Splatting \cite{kerbl20233d} model. Our proposed \we{} model effectively balances appearance refinement and geometric accuracy, particularly in preserving high-frequency details and achieving high-fidelity results.

\end{abstract}
