
# SELF-ALIGNMENT FOR OFFLINE SAFE REINFORCEMENT LEARNING

## Instructions

Experiments require MuJoCo, Safety-Gymnasium, DSRL. Follow the instructions in the [mujoco-py repo](https://github.com/openai/mujoco-py) to install. Then, dependencies can be installed with the following command:

```
conda env create -f conda_env.yml
```

## Downloading datasets

Datasets will be downloaded in the 'data' directory. 
If you want to experiment in MuJoCo environment like Hopper-v3 with medium dataset, then run the following script to download the dataset and save it:

```
python download_d4rl_datasets.py --env Hopper-v3 --proficiency medium
```

If you want to experiment in Safety-Gymnasium environment like OfflinePointGoal1-v0 (only exist expert dataset), then run the following script to download the dataset and save it:

```
python download_dsrl_datasets.py --env OfflinePointGoal1-v0
```

Or if you want to download all Safety-Gymnasium environment datasets for our experiments, then run the following script to download the dataset:

```
python download_dsrl_datasets.py --download_all
```


## Example usage

Experiments can be reproduced with the following:

```
python experiment.py --env hopper --dataset medium --exp_name tmp
```

Or for Safety-Gymnasium environments

```
python experiment.py --env dsrl_pointgoal1 --exp_name tmp
```

If you want to load trained model and compare default DT with DT+ours, then run the following script to load the model:

```
python experiment.py --env dsrl_pointgoal1 --exp_name tmp --load
```

In the results of the experiment, test/returns, costs, failures are the performance of default DT and test/return_prom, costs_prom, failure_prom are the performance of DT+ours.