# Guide to the reproducing results
This codebase contains online ME-TRPO implementation, forked from [Model-Ensemble Trust-Region Policy Optimization (ME-TRPO)](https://github.com/WilsonWangTHU/mbbl-metrpo), which is one of the [Model Based Reinforcement Learning Benchmarking Library (MBBL)](https://github.com/WilsonWangTHU/mbbl).

Kurutach, Thanard, et al. "Model-Ensemble Trust-Region Policy Optimization." arXiv preprint arXiv:1802.10592 (2018).[link](https://arxiv.org/abs/1802.10592).

## Dependencies
We recommend you to use Docker.

You can use python 3.6. You must download MuJoCo 1.31 from https://www.roboti.us/, and then install package dependencies.

```
pip install -r requirements.txt
```

## Run experiments
Run experiments using the following command:

```
python main.py --env <env_name> --exp_name <experiment_name> --sub_exp_name <exp_save_dir> --param_path configs/params_<env_name>_online.json --random_seeds 0 --onpol_iters 500
```

- `env_name`: one of `(ant, half_cheetah, hopper, walker2d, cheetah_run)`
- `exp_name`: what you want to call your experiment
- `sub_exp_name`: partial path for saving experiment logs and results
- `param_path`: path to config json file
- `onpol_iters`: number of outer iteration (inner iteration is set to 25)

Experiment results will be logged to `./log/<exp_save_dir>/<experiment_name>`

e.g. `python main.py --env ant --exp_name example --sub_exp_name me_trpo --param_path configs/params_ant_online.json --random_seeds 1234 --onpol_iters 500`
