MAST: A Sparse Training Framework for Multi-agent Reinforcement Learning

Pihe Hu; Shaolong Li; Longbo Huang

MAST: A Sparse Training Framework for Multi-agent Reinforcement Learning

Pihe Hu, Shaolong Li, Longbo Huang

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Sparse Training, Multi-Agent Reinforcement Learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Deep Multi-agent Reinforcement Learning (MARL) is often confronted with large state and action spaces, necessitating the utilization of neural networks with extensive parameters and incurring substantial computational overhead. Consequently, there arises a pronounced need for methods that expedite training and enable model compression in MARL. Nevertheless, existing training acceleration techniques are primarily tailored for single-agent scenarios, as the task of compressing MARL agents within sparse models presents unique and intricate challenges. In this paper, we introduce an innovative Multi-Agent Sparse Training (MAST) framework. MAST capitalizes on gradient-based topology evolution to exclusively train multiple MARL agents using sparse networks. This is then combined with a novel hybrid TD-($\lambda$) schema, coupled with the Soft Mellowmax Operator, to establish dependable learning targets, particularly in sparse scenarios. Additionally, we employ a dual replay buffer mechanism to enhance policy stability within sparse networks. Remarkably, our comprehensive experimental investigation on the SMAC benchmarks, for the first time, that deep multi-agent Q learning algorithms manifest significant redundancy in terms of Floating Point Operations (FLOPs). This redundancy translates into up to $20$-fold reduction in FLOPs for both training and inference, accompanied by a commensurate level of model compression, all achieved with less than 3\% performance degradation.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7594

Loading