# Improve within the Line: Refined Behavior Regularization for Offline Reinforcement Learning


## Environment
Paper results were collected with [MuJoCo 210] (and [mujoco-py 2.1.2.14]) in [OpenAI gym 0.18.3] with the [D4RL datasets]. Networks are trained using [PyTorch 1.10.0] and [Python 3.8.5].

```bash
conda create -n car python=3.8.5
conda activate car
pip install --no-cache-dir -r requirements.txt
```

### Offline RL

Run the following command to train offline RL on D4RL.

```
python main.py --config configs/offline/hopper-medium.yml
python main.py --config configs/offline/antmaze-large-diverse.yml
```

You can also specify all the configs in 'configs/offline' and 'configs/offline'. 

It is noted that all the configs of tasks will be made public in the future.


#### Logging

This codebase uses tensorboard. You can view saved runs with:

```
tensorboard --logdir <run_dir>
```

### Online Fine-tuning

Run the following command to online fine-tune on AntMaze with pretrained VAE models and offline models.

```
python main_finetune.py --config configs/online_finetune/antmaze-large-diverse.yml --pretrain_model <pretrain_model/>
```

