TL;DR: Robust and naturalistic driving emerges from self-play in simulation at unprecedented scale
Abstract: Self-play has powered breakthroughs in two-player and multi-player games. Here we show that self-play is a surprisingly effective strategy in another domain. We show that robust and naturalistic driving emerges entirely from self-play in simulation at unprecedented scale -- 1.6 billion km of driving. This is enabled by Gigaflow, a batched simulator that can synthesize and train on 42 years of subjective driving experience per hour on a single 8-GPU node. The resulting policy achieves state-of-the-art performance on three independent autonomous driving benchmarks. The policy outperforms the prior state of the art when tested on recorded real-world scenarios, amidst human drivers, without ever seeing human data during training. The policy is realistic when assessed against human references and achieves unprecedented robustness, averaging 17.5 years of continuous driving between incidents in simulation.
Lay Summary: We show that robust and naturalistic driving emerges from self-play in simulation at unprecedented scale. We built a simulator and training environment that allows us to simulate 1.6 billion km of driving (the distance from the sun past Saturn). A policy trained on this amount of simulated driving data learns to drive, communicate with other drivers, and almost never collides purely through self-play, trying to drive among different versions of itself. Our driver generalizes to novel scenarios based on recorded real-world scenes and achieves state-of-the-art performance on three separate benchmarks.
Primary Area: Reinforcement Learning->Deep RL
Keywords: Reinforcement Learning, Autonomy, Simulation, Driving, Self-play
Submission Number: 6859
Loading