# BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning

This directory is the implementation for our proposed Bidirectional Trajectory Diffusion (BiTrajDiff) framework. 

## Requirement

- Python==3.8.10
- torch==2.4.0
- gym==0.21.0
- mujoco-py==2.0.2.0
- cython==0.29.32
- einops==0.7.0
- zarr==2.12.0
- numpy==1.22.4
- numba==0.56.4
- h5py==3.10.0
- scipy==1.9.0
- hydra-core==1.2.0
- dill==0.3.5.1
- av==10.0.0

## Tasks and Datasets

[D4RL Benchmark](https://github.com/rail-berkeley/d4rl) were used for evaluation in this paper. We take the D4RL locomotion, navigation, and manipulation tasks for evaluation.

#### Locomotion tasks

 Three tasks were used in this paper:_halfcheetah-v2, hopper-v2, walker2d-v2_, and each of the task has three types of datasets: *medium*, *medium-replay*, *medium-expert*.

####  Navigation tasks

 The navigation tasks include Maze and Antmaze tasks. Maze tasks are consist of *maze2d-umaze-v1, maze2d-medium-v1, maze2d-large-v1*, while the Antmaze tasks is composed of *antmaze-umaze-diverse-v0, antmaze-medium-diverse-v0, antmaze-large-diverse-v0*. All the datasets are collected by the expert behavior policy.

####  Manipulation tasks

 The manipulation tasks are mainly composed of Franka Kitchen tasks: *kitchen-complete-v0, kitchen-partial-v0, kitchen-mixed-v0*. All the datasets are collected by the expert behavior policy.

## Usage

The BiTrajDiff model training can be reproduced by :

```
python extension_d4rl_mujoco.py task=<env_name> mode=train_diffusion
```

After the BiTrajDiff model training finished, you can utilized the trained BiTrajDiff model to generate your own dataset for enhancing the offline RL algorithm:

```
python extension_d4rl_mujoco.py task=<env_name> mode=stitch
```

More detailed hyperparameters are provided in `config` directory.