Learning to Drive with Two Minds: A Competitive Dual-Policy Approach in Latent World Models

Learning to Drive with Two Minds: A Competitive Dual-Policy Approach in Latent World Models

ICLR 2026 Conference Submission19248 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: autonomous driving, world model, imitation learning, reinforcement learning

TL;DR: We propose a dual-policy framework that uses a latent world model to combine imitation and reinforcement learning for autonomous driving, improving generalization and performance on challenging scenarios without external simulators.

Abstract: End-to-end autonomous driving models trained solely with imitation learning (IL) often suffer from poor generalization. In contrast, reinforcement learning (RL) promotes exploration through reward maximization but faces challenges such as sample inefficiency and unstable convergence. A natural solution is to combine IL and RL. Moving beyond the conventional two-stage paradigm (IL pretraining followed by RL fine-tuning), we propose CoDrive, a competitive dual-policy framework that enables IL and RL agents to interact during training. CoDrive introduces a competition-based mechanism that facilitates knowledge exchange while preventing gradient conflicts. Experiments on the nuScenes dataset show an 18\% reduction in collision rate compared to baselines, along with stronger generalization and improved performance on long-tail scenarios. Code is available at an anonymous repository.

Primary Area: applications to robotics, autonomy, planning

Submission Number: 19248

Loading