Keywords: World Models; Generative Models
TL;DR: We propose Geometrically-Regularized World Models (GRWM), a plug-and-play regularizer that improves latent structure and enables state-of-the-art long-horizon fidelity in deterministic 3D environments.
Abstract: A world model is an internal model that simulates how the world evolves. Given past observations and actions, it predicts the future physical state of both the embodied agent and its environment. Accurate world models are essential for enabling agents to think, plan, and reason effectively in complex, dynamic settings. However, existing models remain fragile, lack robustness, and struggle with reliable long-horizon predictions. In this work, we take a step toward building a truly accurate world model by addressing a fundamental yet open problem: constructing a model that can fully clone and overfit to a deterministic 3D world. Exteroceptive sensory inputs, such as images, are high-dimensional and encode complex, nonlinear mappings from the underlying physical processes, making precise latent-state prediction challenging. Overcoming this requires a representation space that faithfully captures the underlying physical states while minimizing information loss and noise. Such a representation simplifies the subsequent dynamics modeling task, making representation quality critical to overall world model accuracy. We propose Geometrically-Regularized World Models (GRWM), which enforces that consecutive points along a natural sensory trajectory remain close in latent space. This approach yields significantly improved latent representations that align closely with the true topology of the environment. Our method is plug-and-play, requires only minimal architectural modification, and scales naturally with trajectory length. It applies broadly across latent generative backbones and achieves state-of-the-art fidelity on long-horizon prediction benchmarks. Both qualitative and quantitative analyses show that its success comes from learning a latent space with superior geometric structure.
Primary Area: generative models
Submission Number: 4929
Loading