Offline Multi-Agent Reinforcement Learning with Knowledge Distillation

Wei-Cheng Tseng; Tsun-Hsuan Wang; Yen-Chen Lin; Phillip Isola

Offline Multi-Agent Reinforcement Learning with Knowledge Distillation

Wei-Cheng Tseng, Tsun-Hsuan Wang, Yen-Chen Lin, Phillip Isola

Published: 31 Oct 2022, Last Modified: 22 Oct 2022NeurIPS 2022 AcceptReaders: Everyone

Keywords: offline multi-agent reinforcement learning, multi-agent, offline reinforcement learning

Abstract: We introduce an offline multi-agent reinforcement learning ( offline MARL) framework that utilizes previously collected data without additional online data collection. Our method reformulates offline MARL as a sequence modeling problem and thus builds on top of the simplicity and scalability of the Transformer architecture. In the fashion of centralized training and decentralized execution, we propose to first train a teacher policy as if the MARL dataset is generated by a single agent. After the teacher policy has identified and recombined the "good" behavior in the dataset, we create separate student policies and distill not only the teacher policy's features but also its structural relations among different agents' features to student policies. Despite its simplicity, the proposed method outperforms state-of-the-art model-free offline MARL baselines while being more robust to demonstration's quality on several environments.

TL;DR: We propose an offline learning framework for multi-agent scenario which outperform previous state-of-the-art offline multi-agent frameworks

Supplementary Material: pdf

9 Replies

Loading