Efficient Exploration in Multi-Agent Reinforcement Learning via Farsighted Self-Direction

Published: 30 Apr 2025, Last Modified: 30 Apr 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multi-agent reinforcement learning faces greater challenges with efficient exploration compared to single-agent counterparts, primarily due to the exponential growth in state and action spaces. Methods based on intrinsic rewards have been proven to enhance exploration efficiency in multi-agent scenarios effectively. However, these methods are plagued by instability during training and biases in exploration direction. To address these challenges, we propose Farsighted Self-Direction (FSD), a novel model-free method that utilizes a long-term exploration bonus to achieve coordinated exploration. Since prediction error against individual Q-values indicates a potential bonus for committed exploration, it is taken into account in action selection to directly guide the coordinated exploration. Further, we also use clipped double Q-learning to reduce noise in prediction error. We validate the method on didactic examples and demonstrate the outperformance of our method on challenging StarCraft II micromanagement tasks.
Submission Length: Regular submission (no more than 12 pages of main content)
Code: https://github.com/tcsoar/FSD
Assigned Action Editor: ~Marcello_Restelli1
Submission Number: 4334
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview