# OVD-Explorer: A General Information-theoretic Exploration Approach for Reinforcement Learning

This repository is the official implementation of [OVD-Explorer: A General Information-theoretic Exploration Approach for Reinforcement Learning](https://openreview.net/forum?id=-YAqAIsxr7v). 

Our implementation is based on OAC (https://github.com/microsoft/oac-explore) under MIT license, so we remain the OAC's original license ```LICENSE_OAC``` in this respository. Besides, the main code of OVD-Explorer is in the file ```optimistic_exploration.py```.

## Requirements

To install requirements:

```setup
conda env create -f environment.yml
```

## Training

To train the models in the paper, run these commands:

```train
    # for OVDE_G
    python main.py --seed=0 --domain=ant --num_expl_steps_per_train_loop 1000 --num_trains_per_train_loop 1000 --alpha 0.05 --beta 3.2 --sigma 0 --z 0.5 --use_aleatoric --version 13 --ee 
    # for OVDE_Q
    python main.py --seed=0 --domain=ant --num_expl_steps_per_train_loop 1000 --num_trains_per_train_loop 1000 --alpha 0.05 --beta 3.2 --sigma 0 --z 0.5 --use_aleatoric --use_quantile_cdf --version 14 --ee 
```

To reprodece the results as shown in our Figure 6.(a) and (b), run this script after initilised the folder as a git respsitory:
```
./run.sh
```

All the training data, including debug log and models parameters can be saved in the folder ```data/master/```. In our submitted folder, we put the oringinal data generated by the training process on task Ant-v2 in folder ```data/master/```.

## Evaluation

The evaluation results is saved in the data folder. To reprodece the figure 6.(a), run this commond:

```eval
python -m plotting.plot_total_ant
```
