# Revisiting the Minimalist Approach to Offline Reinforcement Learning

## Dependencies & Docker setup
To set up python environment (with dev-tools of your taste, in our workflow we use conda and python 3.8), 
just install all the requirements:

```commandline
python3 install -r requirements.txt
```

However, in this setup, you would also need to install mujoco210 binaries by hand. Sometimes this is not super straightforward,
but we used this recipe:
```commandline
mkdir -p /root/.mujoco \
    && wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -O mujoco.tar.gz \
    && tar -xf mujoco.tar.gz -C /root/.mujoco \
    && rm mujoco.tar.gz
export LD_LIBRARY_PATH=/root/.mujoco/mujoco210/bin:${LD_LIBRARY_PATH}
```
You may also need to install additional dependencies for mujoco_py. 
We recommend following the official guide from [mujoco_py](https://github.com/openai/mujoco-py).

### Docker

We also provide a simpler way, with a dockerfile that is already set up to work, all you have to do is build and run it :)
```commandline
docker build -t rebrac .
```
To run, mount current directory:
```commandline
docker run -it \
    --gpus=all \
    --rm \
    --volume "<PATH_TO_THE_REPO>:/workspace/" \
    --name rebrac \
    rebrac bash
```

### V-D4RL
In order to reproduce V-D4RL it is needed to be downloaded. 

By default, data is expected to be stored inside `vd4rl` directory in the directory from which training script is called.

## How to reproduce experiments

### Training

Configs for the main experiments are stored in the `configs/rebrac/<task_type>` and `configs/rebrac-vis/<task_type>`. 
All available hyperparameters are listed in the `rebrac/algorithms/rebrac.py` for D4RL and `rebrac/algorithms/rebrac_torch_vis.py` for V-D4RL.

For example, to start ReBRAC training process with D4RL `halfcheetah-medium-v2` dataset, run the following:
```commandline
PYTHONPATH=. python3 src/algorithms/rebrac.py --config_path="configs/rebrac/halfcheetah/halfcheetah_medium.yaml"
```

For V-D4RL `walker_walk-expert-v2` dataset, run the following:
```commandline
PYTHONPATH=. python3 src/algorithms/rebrac_torch_vis.py --config_path="configs/rebrac-vis/walker_walk/expert.yaml"
```

### Targeted Reproduction
To reproduce results from our work, you can use the configs for [Weights & Biases Sweeps](https://docs.wandb.ai/guides/sweeps/quickstart) provided in the `configs/sweeps`.  Note, we do not supply a codebase for both IQL and SAC-RND. However, in our work, we relied upon these implementations: [IQL (CORL)](https://github.com/tinkoff-ai/CORL), [SAC-RND (original implementation)](https://github.com/tinkoff-ai/sac-rnd).

| Paper element          | Sweeps to run from `configs/sweeps/`                         |
|------------------------|--------------------------------------------------------------|
| Tables 2, 3, 4         | `eval/rebrac_d4rl_sweep.yaml`, `eval/td3_bc_d4rl_sweep.yaml` |
| Table 5                | `eval/rebrac_visual_sweep.yaml`                              |
| Table 6                | All sweeps from `ablations`                                  |
| Figure 2               | All sweeps from `network_sizes`                              |
| Hyperparameters tuning | All sweeps from `tuning`                                     |


### EOP and Performance Profiles
To reproduce EOP and Performance Profiles see `eop/ReBRAC_ploting.ipynb`. 

We provide data required for the plotting in `eop/bin` as pickled Python dicts so it can be easily reused in your work. 
