# Smooth Exploration for Robotic Reinforcement Learning

This folder contains the code to replicate the appendix results from the paper on PyBullet environments.

We used a modified version of [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) for the algorithms and a modified version of [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo) for running the experiments.


## Installation

1. Install modified version of SB3:
```
cd sb3_gsde/
pip install -e .[extra]
```

Note: Please take a look at [PyTorch](https://pytorch.org/get-started/locally/) installation instructions if you are using anaconda distribution or if you don't need gpu support.

To check the installation:
```
python -c "import stable_baselines3 as sb3;print(sb3.__version__)"
```

This should output `0.8.0gsde`.


2. Install dependencies for the modified version of the rl zoo:
```
cd rl_zoo_gsde/
pip install -r requirements.txt
```

The code of the gSDE distribution can be found in `sb3_gsde/stable-baselines3/common/distributions.py`.


## Run the PyBullet benchmark - gSDE

First, go into the `rl_zoo_gsde/` folder.

For one seed:
```
python train.py --algo sac --env HalfCheetahBulletEnv-v0 --eval-episodes 10
```

Note: if you use a cluster to do multiple runs in parallel, you should pass the `-uuid` (unique id for each log folder) option to ensure there is no race condition when creating the log folder.

Complete benchmark (10 seeds, 4 algorithms, 4 environments):
```
python scripts/run_benchmark.py
```

Plot the results:

```
python scripts/all_plots.py -a a2c ppo sac td3 -e HalfCheetah Ant Hopper Walker2D -f logs -o logs/gsde
```

Plot from saved file:
```
python scripts/plot_from_file.py -i logs/gsde
```

Note: when plotting from file, you should ignore the titles of the bar charts and boxplots (they are only valid when doing ablation study).


## Run the PyBullet benchmark - unstructured Gaussian exploration

Note: we are making use of symbolink link, so the only difference between `rl_zoo_gsde/` and `rl_zoo_gaussian/` is the `hyperparams/` folder (containing hyperparameters).

First, go into the `rl_zoo_gaussian/` folder.

For one seed:
```
python train.py --algo sac --env HalfCheetahBulletEnv-v0
```

Complete benchmark (10 seeds, 4 algorithms, 4 environments):
```
python scripts/run_benchmark.py
```


## Run the ablation study

All hyperparameters can be changed directly from the command line, for instance to play with gSDE `sample_freq` with PPO:

```
cd rl_zoo_gsde/
# Sample the noise function parameters every 16 steps
python train.py --algo ppo --env HalfCheetahBulletEnv-v0 -params sde_sample_freq:16
```

To change the initial standard deviation for SAC:
```
cd rl_zoo_gsde/
python train.py --algo sac --env HalfCheetahBulletEnv-v0 policy_kwargs:"dict(log_std_init=-2, net_arch=[400, 300])
```
