# Prioritized Model Experience Replay
## Requirements
Create new environment `pmer` using conda: 
```bash
conda create --name pmer python=3.9
```

Install all the required dependencies:
1. Install MuJoCo engine
   
    > download and install mujoco210 and correctly set your lib path. You may met some dependency conflict but you can ignore that again, pip install requirements.txt to ensure all the packages are installed (do not change the gym or gymnasium version)
2. Install Python packages using 
   ```bash
   pip install -r requirements.txt
   ``` 

## Example
### PMER+MBPO
---
To get a quick start, you can test `PMER+MBPO` on `InvertedPendulum-v2` using:
> PMER + MBPO
```bash
python3 ./unstable_baselines/model_based_rl/mbpo/main_prior.py \
        unstable_baselines/model_based_rl/mbpo/configs/InvertedPendulum-v2.py \
        --prior_ratio 0.05 \
        --gpu 0
```

```bash
python3 ./unstable_baselines/model_based_rl/mbpo/main_prior.py \
        unstable_baselines/model_based_rl/mbpo/configs/InvertedPendulum-v2.py \
        --prior_ratio 0. \
        --gpu 0
```

The results are stored in `logs/mbpo/InvertedPendulum-v2`. This takes around 5 mins.

Similarly, on `Hopper task`

```bash
python3 ./unstable_baselines/model_based_rl/mbpo/main_prior.py \
        unstable_baselines/model_based_rl/mbpo/configs/Hopper-v3.py \
        --prior_ratio 0.3 \
        --gpu 0
```

```bash
python3 ./unstable_baselines/model_based_rl/mbpo/main_prior.py \
        unstable_baselines/model_based_rl/mbpo/configs/Hopper-v3.py \
        --prior_ratio 0. \
        --gpu 0
```

### PMER+ADMPO
---
To run `PMER+ADMPO`, using:
> PMER + ADMPO
```bash
python main.py --env-name HumanoidTruncatedObs --device cuda:0
python main.py --env-name Hopper-v3 --device cuda:0
```

## Reference
- ADMPO algorithm paper (Online Part): Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning (link removed for anonymous review).
- MBPO pytorch: Unstable Baselines (link removed for anonymous review).
- MBPO algorithm paper: When to Trust Your Model: Model-Based Policy Optimization (link removed for anonymous review).
