Cooperative Multi-Agent Reinforcement Learning in Convention Reliant Environments

Jarrod Shipton

Published: 01 Jan 2024, Last Modified: 30 Sept 2024AAMAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: There has been a substantial increase in interest in the field of Reinforcement Learning (RL), particularly that of using it to solve problems involving cooperation between many different agents, examples include self driving cars, robot assistants and robots in warehouses. Multi-Agent Reinforcement Learning (MARL) has been used with varying levels of success in these cooperative environments enabling two or more agents to be trained to work collaboratively toward a common goal. It has been established that training agents in self-play (SP) can achieve emergent behaviours in which agents adopt different conventions to solve a problem, however a mismatch in convention could lead to sub-optimal or even disastrous results. For instance, in driving, adherence to a unified convention, such as driving on the left or the right, is crucial to prevent collisions. This work introduces a strategy to address convention mismatches by creating a population of agents with diverse conventions and learns to identify which convention should be adopted for a given group of agents.