Safe Offline-to-Online Multi-Agent Decision Transformer: A Safety Conscious Sequence Modeling Approach

Aamir Bader Shah; Yu Wen; Jiefu Chen; Xuqing Wu; Xin Fu

Safe Offline-to-Online Multi-Agent Decision Transformer: A Safety Conscious Sequence Modeling Approach

Aamir Bader Shah, Yu Wen, Jiefu Chen, Xuqing Wu, Xin Fu

Published: 01 Jan 2024, Last Modified: 23 Mar 2025IROS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We introduce the Safe Offline-to-Online Multi-Agent Decision Transformer (SO2-MADT), an innovative framework that revolutionizes safety considerations in Multi-agent Reinforcement Learning (MARL) through a novel sequence modeling approach. Leveraging the dynamic capabilities inherent in Decision Transformers, our methodology seamlessly incorporates safety protocols as a cornerstone element, ensuring secure operations throughout both the offline pre-training phase and the adaptive online fine-tuning phase. At the core of our framework lie two pivotal innovations: the Safety-To-Go (STG) token, embedding safety at a macro level, and the Agent Prioritization Module (APM), facilitating explicit credit assignment at a micro level. Through extensive testing against the challenging environments of the StarCraft Multi-Agent Challenge (SMAC) and Multi-agent MuJoCo, our SO2-MADT not only excels in offline pre-training but also demonstrates superior performance during online fine-tuning, without any degradation in performance. The implications of our work provide a pathway for deployment in critical real-world applications where safety is paramount and non-negotiable. The code is available at https://github.com/shahaamirbader/SO2-MADT.

Loading