Delta-Triplane Transformers as Occupancy World Models

ICLR 2026 Conference Submission12583 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: occupancy world models, triplane, multi-scale, autoregression
Abstract: Occupancy World Models (OWMs) aim to predict future scenes via 3D voxelized representations of the environment to support intelligent motion planning. Existing approaches typically generate full future occupancy states from VAE-style latent encodings. In contrast, we propose Delta-Triplane Transformers (DTT), a novel 4D OWM for autonomous driving. DTT adopts temporal triplane as the occupancy representation, and focuses on modeling changes in occupancy rather than dealing with full states. The core insight is that changes in the compact 3D latent space are naturally sparser and easier to model, enabling higher accuracy with a lighter-weight architecture. We first pretrain a triplane representation model that encodes 3D occupancy compactly, and then extract multi-scale motion features from historical data and iteratively predict future triplane deltas. These deltas are combined with past states to decode future occupancy and ego-motion trajectories. Extensive experiments show that DTT achieves a state-of-the-art mean IoU of 30.85, reduces mean absolute planning error to 1.0 meter, and runs in real time at 26 FPS on an RTX 4090. Demo videos and code are provided in the supplementary material.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 12583
Loading