<div align="center">
<h1>
  MOSDT: Self-Distillation-Based Decision Transformer for Multi-Agent Offline Safe Reinforcement Learning
</h1>
</div>

<div align="center">

</div>

**MOSDT is the first algorithm designed for multi-agent offline safe reinforcement learning (MOSRL). MOSDB is the first dataset and benchmark for this domain.**

Different from most existing knowledge distillation-based multi-agent RL methods, we propose policy self-distillation (PSD) with a new global information reconstruction scheme by fusing the observation features of all agents, streamlining training and improving parameter efficiency. We adopt full parameter sharing across agents, significantly slashing parameter count and boosting returns up to 38.4-fold by stabilizing training. We propose a new plug-and-play cost binary embedding (CBE) module, which encodes cumulative costs as safety binary signals and embeds the signals into return features for efficient information aggregation.

On the strong MOSDB benchmark, MOSDT achieves state-of-the-art (SOTA) returns in **14 out of 18** tasks (across all base environments including MuJoCo, Safety Gym, and Isaac Gym) while ensuring complete safety, with only **65%** of the execution parameter count of a SOTA single-agent offline safe RL method CDT.

**MOSDB dataset and results can be found [here](https://drive.google.com/drive/folders/1rdkD3eZkX3pnFeKp6GHKGnjCaBMUasS-?usp=sharing).**

# Installation

Create a new python environment:
```bash
conda create -n MOSDT python=3.8
```

Please install the [DSRL](https://www.offline-saferl.org/) benchmark.

Install MOSDT:
```bash
git clone https://github.com/Lucian1115/MOSDT.git
cd MOSDT
pip install -e .
```

Please download the [MOSDB](https://drive.google.com/drive/folders/1rdkD3eZkX3pnFeKp6GHKGnjCaBMUasS-?usp=sharing) dataset.

# Training

Train MOSDT on a task in the MOSDB dataset:
```bash
python examples/train/train_mosdt.py --task FreightFrankaCloseDrawer
```