Socially-Attentive Policy Optimization in Multi-Agent Self-Driving System

Zipeng Dai; Tianze Zhou; Kun Shao; David Henry Mguni; Bin Wang; Jianye HAO

Socially-Attentive Policy Optimization in Multi-Agent Self-Driving System

Zipeng Dai, Tianze Zhou, Kun Shao, David Henry Mguni, Bin Wang, Jianye HAO

Published: 10 Sept 2022, Last Modified: 05 May 2023CoRL 2022 PosterReaders: Everyone

Keywords: self-driving, social preference, multi-agent, reinforcement learning

TL;DR: We focus on how to improve the safety and efficiency of a multi-agent self-driving system, by navigating vehicles to learn socially-compatible behaviors.

Abstract: As increasing numbers of autonomous vehicles (AVs) are being deployed, it is important to construct a multi-agent self-driving (MASD) system for navigating traffic flows of AVs. In an MASD system, AVs not only navigate themselves to pursue their own goals, but also interact with each other to prevent congestion or collision, especially in scenarios like intersection or lane merging. Multi-agent reinforcement learning (MARL) provides an appealing alternative to generate safe and efficient actions for multiple AVs. However, current MARL methods are limited to describe scenarios where agents interact in either a cooperative of competitive fashion within one episode. Ordinarily, the agents' objectives are defined with a global or team reward function, which fail to deal with the dynamic social preferences (SPs) and mixed motives like human drivers in traffic interactions. To this end, we propose a novel MARL method called Socially-Attentive Policy Optimization (SAPO), which incorporates: (a) a self-attention module to select the most interactive traffic participant for each AV, and (b) a social-aware integration mechanism to integrate objectives of interacting AVs by estimating the dynamic social preferences from their observations. SAPO solves the problem of how to improve the safety and efficiency of MASD systems, by enabling AVs to learn socially-compatible behaviors. Simulation experiments show that SAPO can successfully capture and utilize the variation of the SPs of AVs to achieve superior performance, compared with baselines in MASD scenarios.

Student First Author: yes

Supplementary Material: zip

27 Replies

Loading