# Constrained Diffusion Policy Optimization for Offline Reinforcement Learning

## Overview 
This is the codebase of **two-fold improved diffusion policy** (**TDP**), which is based on **CleanDiffuser** codebase. 

## Installation
Before installation, move the directory to this project folder.

```
conda create --name tdp python=3.10
conda activate tdp
pip install -e .
pip install git+https://github.com/Farama-Foundation/d4rl@726eef4503b6a668f59904fde158b7b966d08ecc
pip install stable-baselines3==2.3.0
pip install shimmy
pip install gymnasium==0.29.0
pip install mujoco==3.1.6
pip install dm_control==1.0.20
```

## Examples

For MuJoCo locomotion tasks, run the code as follows:

```
# TDP on Halfcheetah medium-replay
python pipelines/tdp_mujoco.py task=halfcheetah-medium-replay-v2 
```

For Antmaze tasks except for Antmaze ultra-play and Antmaze ultra-diverse, run the code as follows:

```
# TDP on Antmaze medium-play
python pipelines/tdp_antmaze.py task=antmaze-medium-play-v0
```

For Antmaze ultra-play and Antmaze ultra-diverse tasks, run the code as follows:

```
# TDP on Antmaze ultra-play
python pipelines/tdp_antmaze_ultra.py task=antmaze-ultra-play-v0
```

For Kitchen tasks, run the code as follows:

```
# TDP on Kitchen-mixed
python pipelines/tdp_kitchen.py task=kitchen-mixed-v0
```

If the process stops, addtionally run the following code and rerun:

```
ulimit -n 1000000
```
