One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering

Yifan Zhu; Aaron Dollar; zherong pan

One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering

Yifan Zhu, Aaron Dollar, zherong pan

Published: 21 Jun 2025, Last Modified: 21 Jun 2025SWOMO RSS25 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Rigid Body World Model, Differentiable Simulation and Rendering

Abstract: Identifying predictive world models for robots from sparse online observations is essential for robot task planning and execution in novel environments. However, existing methods that leverage differentiable programming to identify world models are incapable of jointly optimizing the geometry, appearance, and physical properties of the scene. In this work, we introduce a novel \revise{rigid} object representation that allows the joint identification of these properties. Our method employs a novel differentiable point-based geometry representation coupled with a grid-based appearance field, which allows differentiable object collision detection and rendering. Combined with a differentiable physical simulator, we achieve end-to-end optimization of world models or rigid objects, given the sparse visual and tactile observations of a physical motion sequence. Through a series of \revise{world model identification} tasks in simulated and real environments, we show that our method can learn both simulation- and rendering-ready rigid world models from only one robot action sequence.

Submission Number: 5

Loading