From Game-Playing to Self-Driving: Comparing AlphaGo vs AlphaZero Approaches for Driving Controls

Published: 13 Jun 2025, Last Modified: 27 Jun 2025RL4RS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, Monte Carlo tree search, autonomous driving, continuous control, AlphaZero
TL;DR: In this work, we compare AlphaGo-style methods, initialized from human policies, with AlphaZero-style learning from scratch on a realistic driving control task.
Abstract: We investigate approaches comparing AlphaGo-style methods, initialized from human policies, with AlphaZero-style learning from scratch for real-world control tasks. While AlphaZero achieved superior performance in Go without human data, we hypothesize that for human-centered environments, human policies can encode safety constraints and behavioral priors difficult to capture in reward functions alone. We evaluate these approaches on a realistic driving simulator using a PID controller as our human-level baseline. Our results show that human-guided Monte Carlo Tree Search (MCTS) significantly outperforms the PID controller, achieving 23\% higher rewards. Importantly, standard AlphaZero Continuous fails to converge due to exploration instability. We explore two key components for stable convergence: guided \new{exploration} and guided rollouts. These findings suggest that human priors may provide crucial constraints for safe and efficient learning in real-world reinforcement learning applications.
Submission Number: 16
Loading