Martian World Model: Controllable Video Synthesis with Physically Accurate 3D Reconstructions

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Martian 3D Video Synthesis, Multimodal Mars Data Curation, Physics‑Accurate Terrain Reconstruction
Abstract: The synthesis of realistic Martian landscape videos, essential for mission rehearsal and robotic simulation, presents unique challenges. These primarily stem from the scarcity of high-quality Martian data and the significant domain gap relative to terrestrial imagery. To address these challenges, we introduce a holistic solution comprising two main components: 1) a data curation framework, Multimodal Mars Synthesis (M3arsSynth), which processes stereo navigation images to render high-fidelity 3D video sequences. 2) a video-based Martian terrain generator (MarsGen), that utilizes multimodal conditioning data to accurately synthesize novel, 3D-consistent frames. Our data are sourced from NASA’s Planetary Data System (PDS), covering diverse Martian terrains and dates, enabling the production of physics-accurate 3D surface models at metric-scale resolution. During inference, MarsGen is conditioned on an initial image frame and can be guided by specified camera trajectories or textual prompts to generate new environments. Experimental results demonstrate that our solution surpasses video synthesis approaches trained on terrestrial data, achieving superior visual quality and 3D structural consistency.
Croissant File: json
Dataset URL: https://huggingface.co/datasets/LongfeiLi/M3arsSynth
Code URL: https://github.com/loongfeili/Martian-World-Model
Supplementary Material: zip
Primary Area: Machine learning approaches to data and benchmarks enrichment, augmentation and processing (supervised, unsupervised, online, active, fine-tuning, RLHF, SFT, alignment, etc.)
Submission Number: 317
Loading