QFuture: Learning Future Expectations in Multi-Agent Reinforcement LearningDownload PDF


22 Sept 2022, 12:31 (modified: 26 Oct 2022, 13:56)ICLR 2023 Conference Blind SubmissionReaders: Everyone
Keywords: multi-agent reinforcement learning, future expectations learning, value decomposition, mutual information
TL;DR: future expectations learning
Abstract: Building accurate and robust value functions to estimate the expected future return from the current state is critical in Multi-Agent Reinforcement Learning. Previous works perform better estimation by strengthening the representation of the value function. However, due to the uncertain and unavailable future, directly estimating the future return from the current state is challenging and cannot be addressed by just promoting representation ability. Socially, humans will derive future expectations from current available information to help evaluate their behavior's long-term return. Motivated by this, we propose a novel framework, called \textit{future expectation multi-agent Q-learning} (QFuture), for better estimating future expected returns. In this framework, we design a future expectation module (FEM) to build future expectations in the calculation process of the individual (IAV) and joint action-value (JAV). In FEM, the future expectations are modeled as random variables and perform representation learning by maximizing their mutual information (MI) with the future trajectory given current observation (in IAV) or state (in JAV). We design a future representation module (FRM) to encode the future trajectory, where a regularizer is designed to ensure informativeness. Experiments on StarCraft II micromanagement tasks and Google Research Football demonstrate that QFuture significantly achieves state-of-the-art performance.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
5 Replies