- Keywords: Multi-Agent Reinforcement Learning, Hierarchical Multi-Agent Reinforcement Learning, Implicit Deep Learning, Differentiable Optimization
- Abstract: Training a multi-agent reinforcement learning (MARL) model with a sparse reward is notoriously difficult because the final outcome (i.e., success or failure) is induced by numerous combinations of interactions among agents. Earlier studies have tried to resolve this issue by using hierarchical MARL to decompose the main task into subproblems or employing an intrinsic reward to induce interactions for learning an effective policy. However, none of the methodologies have shown significant success. In this study, we employ a hierarchically structured policy to induce effective coordination among agents. At every step, LPMARL conducts the two hierarchical decision-makings: (1) solving an agent-task assignment problem and (2) solving a local cooperative game among agents assigned to the same task. For the first step, LPMARL formulates the agent-task assignment problem into a resource assignment problem, a type of linear programming (LP). For this, LPMARL uses a graph neural network to generate state-dependent cost coefficients for the LP problem. The solution of the formulated LP is the assignments of agents to tasks, which decompose agents into tasks to accomplish the sub-goals among the agents in the group. For the lower-level decision, LPMARL employs a general MARL strategy to solve each sub-task. We train the GNN generating the state-dependent LP for high-level decisions and the low-level cooperative MARL policy together end-to-end using implicit function theorem. We empirically demonstrate that our algorithm outperforms existing algorithms in various mixed cooperative-competitive environments.
- One-sentence Summary: This study proposes a linear programming-based hierarchical MARL with implicit deep learning using only sparse reward.
- Supplementary Material: zip