Keywords: Occupancy, World Model, Autonomous Driving
Abstract: In 3D occupancy prediction, temporal information is crucial. Traditional methods fuse multi-frame features through a pipeline of perception, alignment, and fusion, but they overlook the coherence of static elements and the motion patterns of dynamic elements in 3D scenes. Existing methods reformulate 3D prediction as 4D prediction based on current sensor inputs by modeling the continuous evolution of the scene. However, the discrete refinements of the physical properties of dynamic elements in multiple encoding-decoding processes lead to cumulative errors and poor adaptation to dynamic motion. Inspired by non-equilibrium thermodynamics, we propose an Evolutionary Entropy Flow framework that uses Evolutionary Entropy as a carrier for continuous scene evolution, modeling the motion of dynamic elements as the flow of Evolutionary Entropy. We further introduce the Gaussian Entropy Flow World Model (GaussEFW), which represents Evolutionary Entropy Flow as a single, continuous Gaussian Entropy Flow in latent space, in contrast to the discrete refinements from multiple encoding-decoding processes. By predicting Gaussian Entropy Flow based on current RGB observations, we can accurately predict the motion of dynamic elements and learn continuous scene evolution. Extensive experiments on the nuScenes dataset validate the effectiveness of GaussEFW, demonstrating superior performance in dynamic elements prediction and high overall performance.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 25321
Loading