Efficient Multi-Agent Cooperation Learning through Teammate Lookahead

TMLR Paper3677 Authors

13 Nov 2024 (modified: 21 Nov 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Cooperative Multi-Agent Reinforcement Learning (MARL) is a rapidly growing research field that has achieved outstanding results across a variety of challenging cooperation tasks. However, existing MARL algorithms typically overlook the concurrent updates of teammate agents. An agent always learns from the data that it cooperates with one set of (current) teammates, but then practices with another set of (updated) teammates. This phenomenon, termed as ``teammate delay'', leads to a discrepancy between the agent's learning objective and the actual evaluation scenario, which can degrade learning stability and efficiency. In this paper, we tackle this challenge by introducing a lookahead strategy that enables agents to learn to cooperative with predicted future teammates, allowing the explicit awareness of concurrent teammate updates. This lookahead strategy is designed to seamlessly integrate with existing gradient-based MARL methods, enhancing their performance without significant modifications to their underlying structures. The extensive experiments demonstrate the effectiveness of this approach, showing that the lookahead strategy can enhance the cooperation learning efficiency and achieve superior performance over the state-of-the-art MARL algorithms.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Bo_Dai1
Submission Number: 3677
Loading