Dream to Drive: Learning Conditional Driving Policies in Imagination

Published: 01 Jan 2024, Last Modified: 14 May 2025ITSC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Learning driving policies to control autonomous vehicles via reinforcement learning (RL) offers a solution to learn optimal driving behavior directly from sensor data. However, designing a reward function that leads to a driving policy that works in any situation has not yet been achieved. Instead, one has to use different reward functions for different situation. While possible with model predictive control (MPC), approaches based on RL must be re-trained any time the reward function changes. We suggest a different direction: we propose a model-based RL agent that learns a conditional driving policy by simulating behavior for many different reward functions in imagination using a world model. We do so by randomly sampling parameters that shape the reward function and optimizing an actor-critic policy that is conditioned on these parameters. We evaluate our approach in CARLA and demonstrate that our approach combines the flexibility of MPC with the long-term capabilities and execution speed of RL.
Loading