# RL with Perturbed Rewards

The implementation is based on [keras-rl](https://github.com/keras-rl/keras-rl). Thanks to the original authors!

## Dependencies
- python 3.5
- tensorflow 1.10.0, keras 2.1.0
- gym, scipy, scipy, joblib, keras
- progressbar2, mpi4py, cloudpickle, opencv-python, h5py, pandas

Note: make sure that you have successfully installed the baseline package and other packages following (using [virtualenvwrapper](https://virtualenvwrapper.readthedocs.io/en/latest/) to create virtual environment):
```
mkvirtualenv rl-noisy --python==/usr/bin/python3
pip install -r requirements.txt
cd gym-atari/baselines
pip install -e .
```

## Examples
- Classic control (DQN on Cartpole)
```
cd gym-control
python dqn_cartpole.py                                           # true reward
python dqn_cartpole.py --error_positive 0.1 --reward noisy       # perturbed reward
python dqn_cartpole.py --error_positive 0.1 --reward surrogate   # surrogate reward (Wang et al., 2020)
python dqn_cartpole.py --error_positive 0.1 --reward peer        # peer reward (ours)
```

## Reproduce the Results
To reproduce all the results reported in the paper, please refer to `scripts/` folders in `rl-noisy-reward-control` and `rl-noisy-reward-atari`:
- `gym-control/scripts`
  - Cartpole
    - `train-dqn.sh` (DQN)
    - `train-duel-dqn.sh` (Dueling-DQN)
    - `train-pattern.sh` (DQN & Dueling-DQN with different dynamic peer penalty schedule)

If you have eight available GPUs (Memory > 8GB), you can directly run the `*.sh` scripts one at a time. Otherwise, you can follow the instructions in the scripts and run the experiments. It ususally takes one or two hours (GTX-1080 Ti) to train the policy.
