Keywords: Multi-agent Reinforcement Learning, Predictive State Representation, Dynamic Interaction Graph
Abstract: We study reinforcement learning for partially observable multi-agent systems where each agent only has access to its own observation and reward and aims to maximize its cumulative rewards. To handle partial observations, we propose graph-assisted predictive state representations (GAPSR), a scalable multi-agent representation learning framework that leverages the agent connectivity graphs to aggregate local representations computed by each agent. In addition, our representations are readily able to incorporate dynamic interaction graphs and kernel space embeddings of the predictive states, and thus have strong flexibility and representation power. Based on GAPSR, we propose an end-to-end MARL algorithm that simultaneously infers the predictive representations and uses the representations as the input of a policy optimization algorithm. Empirically, we demonstrate the efficacy of the proposed algorithm provided on both a MAMuJoCo robotic learning experiment and a multi-agent particle learning environment.
One-sentence Summary: We propose a new algorithm for MARL under a multi-agent predictive state representation model, where we incorporate a dynamic interaction graph; we provide the theoretical guarantees of our model and run various experiments to support our algorithm.
Supplementary Material: zip