Keywords: simulation learning, dynamics, forecasting, particle dynamics, learning from videos
TL;DR: 3DGSim learns 3D simulators from RGB videos by jointly training inverse rendering and dynamics forecasting.
Abstract: Realistic simulation is critical for applications ranging from robotics to animation.
Video generation models have emerged as a way to capture real-world physics from data, but they often face challenges in maintaining spatial consistency and object permanence, relying on memory mechanisms to compensate.
As a complementary direction, we present 3DGSim, a learned 3D simulator that directly learns physical interactions from multi-view RGB videos.
3DGSim adopts MVSplat to learn a latent particle-based representation of 3D scenes, a Point Transformer for the particle dynamics, a Temporal Merging module for consistent temporal aggregation, and Gaussian Splatting to produce novel view renderings.
By jointly training inverse rendering and dynamics forecasting, 3DGSim embeds physical properties into point-wise latent features. This enables the model to capture diverse behaviors, from rigid and elastic to cloth-like dynamics and boundary conditions (e.g., fixed cloth corners), while producing realistic lighting effects. We show that 3DGSim can generate physically plausible results even in out of distribution cases, e.g. ground removal or multi-object interactions, despite being trained only on single-body collisions.
Supplementary Material: zip
Primary Area: learning on time series and dynamical systems
Submission Number: 13130
Loading