PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation

ICLR 2026 Conference Submission16437 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: World Modeling, Dynamics Modeling, Robotic Manipulation
Abstract: Humans anticipate, from a glance and a contemplated action of their bodies, how the 3D world will respond. This predictive ability is equally vital for enabling robots to manipulate and interact with the physical world. We introduce PointWorld, a foundation 3D world model that unifies state and action in a shared spatial domain and predicts 3D point flow over short horizons: given one or a few RGB-D images and a sequence of robot actions, PointWorld forecasts per-point scene displacements that responds to the actions. To train our 3D world model, we curate a large-scale dataset for 3D dynamics learning spanning real and simulated robotic manipulation in diverse open-world environments—enabled by recent advances in 3D vision and diverse simulated environments—totaling about 2M trajectories and 500 hours. Through rigorous, large-scale empirical studies of backbones, action representations, learning objectives, data mixtures, domain transfers, and scaling, we distill design principles for large-scale 3D world modeling. PointWorld enables zero-shot simulation from in-the-wild RGB-D captures. It also powers model-based planning and control on real hardware that generalizes across diverse objects, and environments, all without task-specific demonstrations or training.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 16437
Loading