DPM: Dual Preferences-based Multi-Agent Reinforcement Learning

Sehyeok Kang; Yongsik Lee; Se-Young Yun

DPM: Dual Preferences-based Multi-Agent Reinforcement Learning

Sehyeok Kang, Yongsik Lee, Se-Young Yun

Published: 17 Jun 2024, Last Modified: 02 Jul 2024ICML 2024 Workshop MHFAIA PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Agent Reinforcement Learning, Preference-based Reinforcement Learning, RLHF, RLAIF

Abstract: Multi-agent reinforcement learning (MARL) has demonstrated strong performance across various domains but still faces challenges in sparse reward environments. Preference-based Reinforcement Learning (PbRL) offers a promising solution by leveraging human preferences to transform sparse rewards into dense ones. However, its application in MARL remains under-explored. We propose Dual Preferences-based Multi-Agent Reinforcement Learning (DPM), which extends PbRL to MARL by introducing preferences comparing not only trajectories but also individual agent contributions. Moreover, the research introduces a novel method taking advantage of Large Language Models (LLMs) to gather preferences, addressing challenges associated with human-based preference collection. Experimental results in the StarCraft Multi-Agent Challenge (SMAC) environment demonstrate significant performance improvements over baselines, indicating the efficacy of DPM in optimizing individual reward functions and enhancing performances in sparse reward settings.

Submission Number: 69

Loading