Keywords: curriculum learning; reinforcement learing; autonomous driving
TL;DR: We introduce CLForAD, the first integration of curriculum learning into batched autonomous driving simulators, and demonstrate that adaptive scenario selection can cut training time by 77%.
Abstract: Batched simulators for autonomous driving have recently enabled the training of reinforcement learning agents on a massive scale, encompassing thousands of traffic scenarios and billions of interactions within a matter of days.
Although such high-throughput feeds reinforcement learning algorithms faster than ever, their sample efficiency has not kept pace:
As the standard training scheme, domain randomization uniformly samples scenarios and thus consumes a vast number of interactions on cases that contribute little to learning.
Curriculum learning offers a remedy by adaptively prioritizing scenarios that matter most for policy improvement.
We present CL4AD, the first integration of curriculum learning into batched autonomous driving simulators by framing scenario selection as an unsupervised environment design problem.
We introduce utility functions that shape curricula based on success rates and the realism of the agent's behavior, in addition to existing regret-estimation functions.
Large-scale experiments on GPUDrive demonstrate that curriculum learning can achieve 99% success rate a billion steps earlier than domain randomization, reducing wall clock time by 77%.
An ablation study with a computational budget further shows that curriculum learning improves sample efficiency by 67% to reach the same success rate.
To support future research, we release an implementation of CL4AD in GPUDrive.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 20411
Loading