$\texttt{EditCast3D}$: Single-Frame-Guided 3D Editing with Video Propagation and View Selection

17 Sept 2025 (modified: 18 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D Editing, Video Editing, 3D Reconstruction.Foundation Model
Abstract: Recent advances in foundation models have driven remarkable progress in image editing, yet their extension to 3D editing remains underexplored. A natural approach is to replace the image editing modules in existing workflows with foundation models. However, their heavy computational demands and the restrictions and costs of closed-source APIs make the plug-in of these models to existing iterative editing strategies impractical. To address this limitation, we propose $\texttt{EditCast3D}$, a pipeline that employs video generation foundation models to propagate edits from a single first frame across the entire dataset prior to reconstruction. While editing propagation enables dataset-level editing via video models, its consistency remains suboptimal for 3D reconstruction, where multi-view alignment is essential. To overcome this, $\texttt{EditCast3D}$ introduces a view selection strategy that explicitly identifies consistent and reconstruction-friendly views and adopts feedforward reconstruction without requiring costly refinement. In combination, the pipeline both minimizes reliance on expensive image editing and mitigates prompt ambiguities that arise when applying foundation models independently across images. We evaluate $\texttt{EditCast3D}$ on commonly used 3D editing datasets and compare against SOTA 3D editing baselines, demonstrating superior editing quality and high efficiency. These results establish $\texttt{EditCast3D}$ as a scalable and general paradigm for integrating foundation models into 3D editing pipelines.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 9856
Loading