Feedforward 4D Reconstruction for Dynamic Driving Scenes using Unposed Images

Xiaoxue Chen; Ziyi Xiong; Yuantao Chen; Gen Li; Nan Wang; Hongcheng Luo; Long Chen; Haiyang Sun; BING WANG; Guang Chen; Hangjun Ye; Hongyang Li; Ya-Qin Zhang; Hao Zhao

Feedforward 4D Reconstruction for Dynamic Driving Scenes using Unposed Images

Xiaoxue Chen, Ziyi Xiong, Yuantao Chen, Gen Li, Nan Wang, Hongcheng Luo, Long Chen, Haiyang Sun, BING WANG, Guang Chen, Hangjun Ye, Hongyang Li, Ya-Qin Zhang, Hao Zhao

15 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: 4D reconstruction, feed-forward model, diffusion model, 3D Gaussian Splatting

TL;DR: We introduce a pose-free, feedforward framework for 4D scene reconstruction from unposed images.

Abstract: Autonomous vehicles require diverse dynamic scenes for robust training and evaluation, yet existing dynamic scene reconstruction methods are often limited by slow per-scene optimization and reliance on explicit annotations or camera calibration. In this paper, we introduce a pose-free, feedforward framework for 4D scene reconstruction that jointly infers camera parameters, dynamic Gaussian representations, and 3D motion directly from sparse, unposed images. Unlike prior feedforward approaches, our model accommodates an arbitrary number of input views, enabling long-sequence modeling and improved generalization. Dynamic objects are disentangled via estimated motion and aggregated into unified 3DGS representations, while a diffusion-based refinement module mitigates flow artifacts and enhances novel view synthesis under sparse inputs. Trained on the Waymo Dataset and evaluated on nuScenes and Argoverse2, our method achieves superior performance while generalizing effectively across datasets, benefiting from the pose-free design that reduces dataset-specific biases. Additionally, the framework supports instance-level scene editing and high-fidelity view synthesis, providing a scalable foundation for real-world autonomous driving simulation.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 5342

Loading