Do not Start with Trembling Hands: Improving Multi-agent Reinforcement Learning with Stable Prefix Policy

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: MARL, Trembling Hands, Exploration, Exploration, Prefix Policy
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: In multi-agent reinforcement learning (MARL), the $\epsilon$-greedy method plays an important role in balancing exploration and exploitation during the decision-making process in value-based algorithms. However, we find that $\epsilon$-greedy can be deemed as the concept of "trembling hands" in game theory when the agents are more in need of exploitation, which may result in the Trembling Hands Nash Equilibrium solution, a suboptimal policy convergence. Besides, eliminating the $\epsilon$-greedy algorithm leaves no exploration and may lead to unacceptable local optimal policies. To address this dilemma, we use the previously collected trajectories to plan an existing optimal template as candidate policy, which we call \textbf{Stable Prefix Policy}, in contrast to trembling hands. When the policy is close to the optimal policy, the agents follow the planned template, and when the policy still needs exploration, the agents will adaptively dropout. We scale our approach to various value-based MARL methods and empirically verify our method in a cooperative MARL task, SMAC benchmarks. Experimental results demonstrate that our method achieves not only better performance but also faster convergence speed than baseline algorithms within 2M time steps.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9370
Loading