# Smooth Exploration for Robotic Reinforcement Learning

This folder contains the code to replicate the main results from the paper "Smooth Exploration for Robotic Reinforcement Learning".

We used a modified version of [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) for the algorithms and a modified version of [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo) for running the experiments.

The code and instructions to reproduce results from the appendix can be found in `appendix_results_code/`.


## Installation

1. Install master version of SB3:
```
pip install git+https://github.com/DLR-RM/stable-baselines3
```

Note: Please take a look at [PyTorch](https://pytorch.org/get-started/locally/) installation instructions if you are using anaconda distribution or if you don't need gpu support.

2. Install modified version of SB3:
```
cd gsde_custom/
pip install -e .
```


3. Install dependencies for the modified version of the rl zoo:
```
cd rl_zoo_gsde/
pip install -r requirements.txt
```

The code of the gSDE distribution can be found in `gsde_custom/gsde_custom/common/distributions.py`.

## Run the experiments

You need to be in `rl_zoo_gsde/` folder.

We give an example of each command for one environment. To run the full benchmark, we provide scripts in the `scripts/` folder which create the proper command line arguments.

For instance, you can take a look at `rl_zoo_gsde/scripts/create_baselines_jobs.py` for the unstructured noise runs.

### gSDE

```
python train.py --algo custom_sac --env HalfCheetahBulletEnv-v0 -params use_sde:True sde_sample_freq:8 --eval-episodes 20 --eval-freq 10000 --n-eval-envs 5
```

## Unstructured Noise Baseline

```
python train.py --algo custom_sac --env HalfCheetahBulletEnv-v0 -params use_sde:False --eval-episodes 20 --eval-freq 10000 --n-eval-envs 5
```

## Correlated Noise (OU Noise) Baseline

```
python train.py --algo custom_sac --env HalfCheetahBulletEnv-v0 -params use_sde:False deterministic_exploration:True noise_std:0.2 noise_type:"'ornstein-uhlenbeck'" --eval-episodes 20 --eval-freq 10000 --n-eval-envs 5
```

### No Noise Baseline

```
python train.py --algo custom_sac --env HalfCheetahBulletEnv-v0 -params use_sde:False deterministic_exploration:True --eval-episodes 20 --eval-freq 10000 --n-eval-envs 5
```

### Parameter Noise Baseline

```
python train.py --algo custom_sac --env HalfCheetahBulletEnv-v0 -params use_sde:False deterministic_exploration:True use_param_noise:True policy_kwargs:'dict(net_arch=[400, 300], layer_norm=True)' --eval-episodes 20 --eval-freq 10000 --n-eval-envs 5
```
