# Reach Avoid Decision Transformer

This repository is the official implementation of [Reach Avoid Decision Transformer (raDT)](). 
An online version of this (anonymized) repository can be found at [this link](https://anonymous.4open.science/r/reach-avoid-decision-transformer-2441).

The code for training and evaluating the baselines is built on top of the [RbSL repository](https://github.com/Sunlighted/RbSL.git), and the code for training and evaluating raDT is built on top of the [MGPO repository](https://github.com/PKU-RL/MGPO.git). 

## Setup

All commands in this section are executed from the root directory. All filepaths in this section are relative to the root directory.

### Environment and Requirements

Install and activate the conda environment containing the required dependencies for this repository:

```environment
conda env create -f environment.yml
conda activate radt
```

The following packages must be installed manually via:
```cosineinstall
pip install 'git+https://github.com/katsura-jp/pytorch-cosine-annealing-with-warmup'
pip install 'git+https://github.com/hklarner/pyboolnet'
```

CUDA-compatable PyTorch must be installed manually via the [instructions for your specific system specifications](https://pytorch.org/get-started/locally/)

Install `raDT` as a Python library using:

```library
pip install -e .
```

This is required for import statements to work correctly.

### Download Data and Pre-Trained Models

You can download the weights for training data and pre-trained models [here](https://drive.google.com/file/d/129EqzJpGfA-N8zV__Mj5RPl7i8ZIlO_h/view?usp=share_link). The file, `weights_and_data.zip`, when uncompressed, is set up to have
the same structure as this repository. Move the model weights and dataset files from their location in `weights_and_data` to the corresponding locations in this repository. Namely:

* Move all training data for baselines to `raDT/baselines/offline_data/`
* Move all training data for raDT to `raDT/radt/dataset/`
* Move all pre-trained model weights for raDT to `raDT/radt/model_saved/`

### Set Constants

In `raDT/constants.py`, set `HOME_PATH` to the absolute path of the root directory on your system and `SAVE_PATH` to the directory
at which you would like to save your temporary model checkpoints.

## Data Prep

### Preparing training data for Gymnasium environments

#### Generating random policy training data 

All commands in this section are executed from the `raDT/baselines` directory. All filepaths are relative to the `raDT/baselines` directory.
```
cd raDT/baselines
```

Example data generation batch scripts can be found in `submit_data_generation.sh`. Or you can execute generation directly using:

```datagen
python3 data_generation_maze.py --env_name `FetchReachObstacle` [additional parameters]
```
for `FetchReachObstacle` or

```datagen
python3 data_generation_fetch.py --env_name `PointMazeObstacle` [additional parameters]
```

for `PointMazeObstacle`.

The following are relevant parameters for `submit_data_generation.sh`:
* `--env_name`: Name of environment: can be `FetchReachObstacle`, `PointMazeObstacle`, or a custom environment.
* `--num_timesteps`: Number of timesteps to generate
* `--maze`: If running for `PointMazeObstacle` environment, use `U_MAZE`, otherwise do not specify
* `--num_avoid`: If running for `PointMazeObstacle` environment, set to the number of avoid states to include. 
* `--suffix`: Optional, a suffix appended to the dataset name

The resulting dataset can be used directly with baseline models.

#### Relabeling/reformatting pipeline for datasets used to train raDT

All commands in this section are executed from the `raDT/radt` directory. All filepaths are relative to the `raDT/radt` directory.
```
cd raDT/baselines
```

All relabeling and data prep code to create a dataset for training raDT can be found in the notebook `data_gymnasium.ipynb`. Note that this pipeline relies on a random policy training dataset already being generated using the pipeline in the section above.

### Preparing training data for Cardiognesis environment

All commands in this section are executed from the `raDT/radt` directory. All filepaths are relative to the `raDT/radt` directory.
```library
cd raDT/baselines
```

The dataset generation and preparation pipeline for the `cardiogenesis` environment is located in the notebook `data_cardiogenesis.ipynb`.

## Training

### Training raDT

All commands in this section are executed from the `raDT/radt` directory. All filepaths are relative to the `raDT/radt` directory.
```
cd raDT/radt
```

Using a GPU is recommended for training raDT. Example training batch scripts can be found in `train.sh`. One can also execute training 
directly using the command:

```train_radt
python3 train.py [parameters]
```

The following are relevant parameters to `train.py`:

* `--exp-name`: Experiment name, will be used as group name for `wandb`.
* `--instance_prefix`: Experiment instance name, will be used as run name for `wandb`.
* `--env`: The environment we are training on: can be `reach_obstacle`, `pointmaze_obstacle`, `cardiogenesis`, or a custom environment.
* `--dataset_path`: Relative path to training data. Should begin with `dataset/...`
* `--n_head`: Number of attention heads
* `--n_layer`: Number of attention layers
* `--embed_dim`: Embedding dimensionality (should be a multiple of `n_head`)
* `--K`: Context length, should match `max_ep_len`
* `--max_ep_len`: Max episode length, should match `K`
* `--batch_size`: Batch size
* `--test_eval_interval`: How often to run a round of evaluation during training (in training steps)
* `--max_iters`: Maximum number of training steps
* `--scheduler`: Learning rate scheduler: can be `lambdalr` or `cosinewarmuprestarts`
* `--T_0`: The `T_0` parameter to the `cosinewarmuprestarts` scheduler (enter 10x the number desired)
* `--warmup_steps`: Number of warmup steps (enter 10x the number desired)
* `--avoid_prompt`: Indicator flag; if present, avoid prompts will be supported, otherwise MGPO is used
* `--max_avoid_prompt_len`: Maximum avoid prompt length, set conservatively
* `--adelta`: Attention boosting delta to the prompt
* `--alpha1`: Balance constant $\alpha$ (as notated in the paper) in the loss function
* `--num_eval_episodes`: Number of evaluation episodes during a round of evaluation
* `--buffer_size`: Set to half of the width of the desired avoid box size
* `--bsa_box_size`: If running for `reach_obstacle` environment, set to half the width of the avoid box desired in the evaluation environment. If specifying this, do not specify `buffer_size`.
* `--maze`: If running for `pointmaze_obstacle` environment, use `U_MAZE`, otherwise do not specify
* `--num_avoid`: If running for `pointmaze_obstacle` environment, set to the number of avoid states to use
* `--fixed_interval`: If running for `cardiogenesis` environment, set to the number of boolean update steps of the boolean network model per single timestep (corresponding to $k$ described in the paper)


### Training Baselines

All commands in this section are executed from the `raDT/baselines` directory. All filepaths are relative to the `raDT/baselines` directory.
```
cd raDT/baselines
```

Using a GPU is recommended for training baselines. Example training batch scripts can be found in `run_train.sh`. One can also execute training directly using the command:

```trainbaseline
python3 train.py [parameters]
```

The following are relevant parameters to `train.py`:

* `--project`: Project name to use in wandb
* `--env`: Environment name: can be `FetchReachObstacle` or `PointMazeObstacle`
* `--method`: Name of baseline approach: can be `AMlag`, `rbsl`, or `wgcsl`
* `--expert_percent`: Percentage of data to be from expert policies, set to 0
* `--random_percent`: Percentage of data to be from random policy, set to 1
* `--massless`: Set to 1 to use avoid regions that agent can pass through
* `--n-epochs`: Number of training epochs
* `--n-test-rollouts`: Number of episodes to evaluate for
* `--bsa_box_size`: If running for `FetchReachObstacle`, specifies the box size
* `--maze`: If running for `PointMazeObstacle` environment, use `U_MAZE`, otherwise do not specify
* `--num_avoid`: IF running for `PointMazeObstacle` environment, set to number of avoid regions
* `--max_episode_steps`: Maximum number of steps in an evaluation episode

## Evaluation

### Evaluating raDT

All commands in this section are executed from the `raDT/radt` directory. All filepaths are relative to the `raDT/radt` directory.
```
cd raDT/radt
```

#### Evaluating raDT on custom Gymnasium Robotics environments

Example evaluation batch scripts can be found in `evaluate.sh`. One can also execute evaluation 
directly using the command:

```eval_radt
python3 train.py --evaluation --load-path [PATH/TO/MODEL/WEIGHTS] [other parameters]
```

Unless specified, all parameters should match the ones `train.py`. Parameters specific to evaluation include:
* `--evaluation`: Flag indicating that we are evaluating an existing model, not training a new one
* `--load-path`: Absolute path to model we are evaluating
* `--device`: Device to evaluate model on
* `--num_eval_episodes`: Number of episodes to evaluate on, can be different from the value set in training
* `--bsa_box_size`: If evaluating on a `reach_obstacle` environment, specifies half the width of the boxes in the environment. Can be different from the training value
* `--num_avoid`: If evaluating on a `pointmaze_obstacle` environment, specifies the number of avoid states in the environment. Can be different from the training value

#### Evaluating raDT on Cardiogenesis Environment

The code for the evaluation pipeline for the `cardiogenesis` environment is located in the notebook `eval_cardiogenesis.ipynb`.

### Evaluating Baselines

All commands in this section are executed from the `raDT/baselines` directory. All filepaths are relative to the `raDT/baselines` directory.
```
cd raDT/baselines
```

Example evaluation batch scripts can be found in `submit_eval.sh`. One can also execute evaluation 
directly using the command:

```baselinesevaluation
python3 evaluate_fetch.py [parameters]
```

for `FetchReachObstacle` or

```baselinesevaluation
python3 evaluate_maze.py [parameters]
```

for `PointMazeObstacle`.

Unless specified, all parameters should match the ones `train.py`. Parameters specific to evaluation include:
* `--model_path`: Path to the model that is being evaluated
* `--num_ep`: Number of evaluation episodes
* `--bsa_box_size`: If evaluating on a `reach_obstacle` environment, specifies half the width of the boxes in the environment. Should match training value
* `--num_avoid`: If evaluating on a `pointmaze_obstacle` environment, specifies the number of avoid states in the environment. Should match training value