Keywords: Autonomous Drivng, End-to-End, World model
Abstract: The comprehensive understanding capabilities of world models for driving scenarios have significantly improved the planning accuracy of end-to-end autonomous driving frameworks. However, the redundant modeling of static regions and the lack of deep interaction with trajectories hinder world models from exerting their full effectiveness. In this paper, we propose a Temporal Residual World Model (TR-World), which focuses on dynamic object modeling. By calculating the temporal residuals of BEV features, the information of dynamic objects can be extracted without relying on detection and tracking. TR-World only takes temporal residuals as the input to make more precise predictions of the dynamic objects' future spatial distribution. By combining the prediction with the static object information contained in the current BEV features, accurate future BEV features can be obtained. Furthermore, we propose Future-Guided Trajectory Refinement (FGTR) module, which conducts interaction between prior trajectories (predicted from the current scene representations) and the future BEV features. This enables effective utilization of future road conditions and also alleviates world model collapsing. Comprehensive experiments conducted on the nuScenes and NAVSIM datasets demonstrate that our method, namely ResWorld, achieves state-of-the-art performance on planning accuracy. Code will be made publicly available.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 4430
Loading