DPM: Dual Preferences-based Multi-Agent Reinforcement Learning

Published: 17 Jun 2024, Last Modified: 02 Jul 2024ICML 2024 Workshop MHFAIA PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent Reinforcement Learning, Preference-based Reinforcement Learning, RLHF, RLAIF
Abstract: Multi-agent reinforcement learning (MARL) has demonstrated strong performance across various domains but still faces challenges in sparse reward environments. Preference-based Reinforcement Learning (PbRL) offers a promising solution by leveraging human preferences to transform sparse rewards into dense ones. However, its application in MARL remains under-explored. We propose Dual Preferences-based Multi-Agent Reinforcement Learning (DPM), which extends PbRL to MARL by introducing preferences comparing not only trajectories but also individual agent contributions. Moreover, the research introduces a novel method taking advantage of Large Language Models (LLMs) to gather preferences, addressing challenges associated with human-based preference collection. Experimental results in the StarCraft Multi-Agent Challenge (SMAC) environment demonstrate significant performance improvements over baselines, indicating the efficacy of DPM in optimizing individual reward functions and enhancing performances in sparse reward settings.
Submission Number: 69
Loading