Abstract: Reinforcement Learning agents require a distribution of environments for their policy to be trained on. The method or process of defining these environments directly impacts robustness and generalization of the learned agent policies. In single agent reinforcement learning, this problem is often solved by domain randomization, or randomizing the environment and tasks within the scope of the desired operating domain of the agent. The challenge here is to generate both structured and solvable environments that guide the agent's learning process. Most recently, works have sought to produce the environments under the Unsupervised Environment Design (UED) formulation. However, these methods lead to a proliferation of adversarial agents to train one agent for a single agent problem in a discretized task domain. In this work, we aim to automatically generate environments that are solvable and challenging for the continuous multi-agent setting. We base our solution on the Teacher-Student relationship with parameter sharing $\textit{Students}$ where we re-imagine the $\textit{Teacher}$ as an environment generator for UED. Our approach uses one environment generator agent ($\textit{Teacher}$) for any number of learning agents ($\textit{Students}$). We qualitatively and quantitatively demonstrate that, in terms of multi-agent ($\geq$ 8 agents) navigation and steering, $\textit{Students}$ trained by our approach outperform agents using heuristic search, as well as agents trained by domain randomization. Our code is available at https://github.com/GAIDG-Lab/MASAI.
1 Reply
Loading