# ADMPO: Any-step Dynamics Model for Policy Optimization

This is the code for the paper "Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning"

## Requirements

To install all the required dependencies:

1. Install MuJoCo engine, which can be downloaded from [here](https://mujoco.org/download).
2. Install Python packages listed in `requirements.txt` using `pip install -r requirements.txt`. You should specify the version of `mujoco-py` in `requirements.txt` depending on the version of MuJoCo engine you have installed.
3. Manually download and install `d4rl` package from [here](https://github.com/rail-berkeley/d4rl).
4. Manually download and install `neorl` package from [here](https://github.com/polixir/NeoRL).

## Run an experiment 

### Online Setting

```shell
python main4online.py --env-name=[Env name] 
```

The config files act as defaults for a task. 

They are all located in `config`.
`--env-name` refers to the config files in `config/` including Hopper-v3, Walker2d-v3, AntTruncatedObs-v3, and HumanoidTruncatedObs-v3.

All results will be stored in the `result` folder.

For example, run ADMPO-ON on Hopper:

```bash
python main4online.py --env-name=Hopper-v3
```

### Offline Setting

```shell
python main4offline.py --env=[Env] --env-name=[Env name] 
```

The config files act as defaults for a task. 

They are all located in `config`.
`--env` refers to the benchmark, D4RL or NeoRL.
`--env-name` refers to the config files in `config/`.

All results will be stored in the `result` folder.

For example, run ADMPO-OFF on hopper-medium-v2 dataset of D4RL benchmark:

```bash
python main4offline.py --env=d4rl --env-name=hopper-medium-v2
```