ReCord: Replay Coordination for Safe and Robust Population-Based Training in Autonomous Driving

Hyeon-Chang Jeon; KyungJoong Kim

ReCord: Replay Coordination for Safe and Robust Population-Based Training in Autonomous Driving

Hyeon-Chang Jeon, KyungJoong Kim

Published: 03 Jun 2026, Last Modified: 08 Jun 2026AI4GOOD Workshop 2026 RegularEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-agent reinforcement learning, Autonomous Driving, Population-based Training, Zero-shot Coordination

TL;DR: ReCord trains autonomous driving policies with non-reactive partner trajectories, improving zero-shot coordination with long-tail partners, achieving better performance in matrix-game and multi-agent driving experiments.

Abstract: Autonomous driving policies trained with self-play reinforcement learning (RL) can generalize to unseen scenarios, but they are trained primarily through interactions among copies of the same policy. As a result, they may fail to prepare for diverse and unfamiliar partner behaviors, which is safety-critical in autonomous driving, where other agents can be aggressive, non-reactive, or otherwise different from those seen during training. Population-based training (PBT) addresses this limitation by training the ego policy with diverse pre-trained partners. However, conventional PBT typically executes partner policies online during ego training, making them reactive to the ego policy. We refer to this standard setting as reactive-PBT. To address this limitation, we propose Replay Coordination (ReCord), which trains the ego policy on fixed trajectories replayed from a diverse partner population. By removing online partner adaptation, ReCord encourages robust coordination without relying on partners' yielding behavior. In both a matrix game and a multi-agent driving simulator, ReCord outperforms reactive-PBT, especially against non-reactive or weakly reactive partners, including replayed human trajectories, while remaining competitive under reactive evaluation.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 66

Loading