Keywords: multi-agent reinforcement learning, task generalisation, knowledge transfer
TL;DR: We extend world value functions to the multi-agent setting in order to achieve provably optimal task generalisation in multi-agent reinforcement learning
Abstract: While task generalisation is widely studied in the context of single-agent reinforcement learning (RL), little research exists in the context of multi-agent RL. The research that does exist usually considers task generalisation implicitly as a part of the environment, and when it is considered explicitly there are no theoretical guarantees. We propose Goal-Oriented Learning for Multi-Task Multi-Agent RL (GOLeMM), a method that achieves provably optimal task generalisation that, to the best of our knowledge, has not been achieved before in MARL. After learning an optimal goal-oriented value function for a single arbitrary task, our method can zero-shot infer the optimal policy for any other task in the distribution given only knowledge of the terminal rewards for each agent for the new task and learnt task. Empirically we show that our method is able to generalise over a full task distribution, while representative baselines are only able to learn a small subset of the task distribution.
Submission Number: 24
Loading