Decentralized Extension for Centralized Multi-Agent Reinforcement Learning via Online Distillation

Zeren Zhang, Bin Zhang, Guangchong Zhou, Dapeng Li, Zhiwei Xu, Guoliang Fan

Published: 01 Jan 2024, Last Modified: 16 Sept 2025ICONIP (3) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recently, researchers have elevated the performance of algorithms in multi-agent reinforcement learning (MARL) to new heights by leveraging sequence models and Transformer architecture. However, introducing Transformer architecture into MARL can naturally lead to centralized strategies that receive information from all agents and issue instructions uniformly. Since a centralized controller is impractical in many real-world tasks, we introduce an online distillation architecture, OLEN, to extend the excellent performance to fully decentralized scenarios. Our approach is applicable to any centralized multi-agent reinforcement learning method. We use online distillation simultaneously with the policy improvement process to reduce interaction costs, enhance data efficiency, and facilitate student-aware distillation. The utilization of hypernetwork enhances both conciseness and scalability in our approach. Additionally, we incorporate a certainty metric and a dynamic coefficient to facilitate meaningful learning. To the best of our knowledge, this work is the first to introduce online distillation into MARL problems. Experiments demonstrate the desirable performance of our method.