Anchor-Guided Behavior Cloning with Offline Reinforcement Learning for Robust Autonomous Driving

Guo Zhou; Lv Feng; Chenliang Wang; Wenxin Wei; Zheng Zhu; Jiagang Zhu; Wenkang Qin; Guan Huang; Hua Yao

Anchor-Guided Behavior Cloning with Offline Reinforcement Learning for Robust Autonomous Driving

Guo Zhou, Lv Feng, Chenliang Wang, Wenxin Wei, Zheng Zhu, Jiagang Zhu, Wenkang Qin, Guan Huang, Hua Yao

19 Sept 2025 (modified: 18 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Autonomous Driving, Behavior Cloning, Offline Reinforcement Learning, World Models, Trajectory Prediction

TL;DR: ABC-RL is a hybrid framework combining Anchor-guided Behavior Cloning with offline Reinforcement Learning under a learned world model to achieve robust and accurate autonomous driving policies.

Abstract: Learning robust driving policy from logged data is challenging due to the distribution shift between open-loop training and closed-loop deployment. We propose ABC-RL, a hybrid framework that integrates Anchor-guided Behavior Cloning (ABC) with offline Reinforcement Learning (RL) under a single-step world model to address this issue. A key innovation of our method is anchor-based behavior cloning, which introduces dynamics-aware intermediate trajectory targets. These anchor points normalize trajectories across different speeds and driving styles, enabling more accurate trajectory prediction and improving generalization to diverse driving scenarios. In addition, we leverage a learned world model to support offline RL: given the current state and action, the world model predicts the next state, which is then encoded to estimate the reward, allowing effective policy learning without environmental interaction. This model-assisted training process enhances learning efficiency and stability under offline settings. To evaluate the effectiveness of ABC-RL, we perform open-loop assessments and develop a closed-loop simulation benchmark using the nuScenes dataset, enabling a comprehensive evaluation of planning stability and safety. Our method achieves state-of-the-art performance, significantly outperforming behavior cloning baselines in both open-loop and closed-loop evaluations. Notably, ABC-RL reduces open-loop trajectory error from 0.29,m to 0.22,m and reduces closed-loop collision rates by over 57%, demonstrating the practical benefits of integrating trajectory-level supervision with model-assisted offline policy refinement. Our findings highlight the potential of ABC-RL under learned world models, offering a scalable and robust solution for real-world autonomous driving.

Primary Area: applications to robotics, autonomy, planning

Submission Number: 15433

Loading