# [Briging the Gap Beteween Offline SL and RL via Q-conditioned maximization]

## Installation

Create virtual environment named `env_stuff` using command:<br>
```sh
python3 -m venv env_stuff
```

Install all the packages used to run the code using the `requirements.txt` file: <br>
```sh
pip install -r requirements.txt
```

## Training

To train an RvS (decision-mlp) agent on pointmaze-umaze using max-Q$:<br> 
```sh
python train_max_dmlp_vae.py dataset_name=pointmaze-umaze-v0 augment_data=True augment_prob=0.5
```

To train a DT (decision-transformer) agent on pointmaze-umaze using max-Q$:<br> 
```sh
python train_max_dt_vae.py dataset_name=pointmaze-umaze-v0 augment_data=True augment_prob=0.5
```

## Datasets
To collect the pointmaze-large dataset with $1e^6$ transitions and seed 1:<br> 
```sh
python collect_pointmaze_data.py pointmaze-large-v0 1 1000000
```

To collect the antmaze-large dataset with $1e^6$ transitions and seed 1:<br> 
```sh
python collect_antmaze_data.py antmaze-umaze-v0 1 1000000
```
