# OMDA: Offline Model-Guided Distribution-Aware Offline-to-Online Reinforcement Learning


🧵 OMDA is a novel offline-to-online RL algorithms that freezes the pre-trained offline Q-function network to provide offline Q-value for each state-action pair to deliver compact offline information, instead of training on offline data. The extracted offline information is incorporated with normal online target Q-value in the Bellman equation weighted by a distribution-aware coefficient to form a combined perspective of the target. The distribution-aware coefficient trained through a conditional variational auto-encoder (C-VAE) represents the distribution of the behavior policy of the offline dataset, and is utilized to capture state-action-wise confidence of the offline information. 

----


## Getting started

```bash
pip install -r requirements.txt
```

## Training all the tasks

```bash
bash run.sh
```

## Training a specific task

```python
python algorithms/<mode>/<algo_name>.py
```

where the mode could be chosen as "offline" for training the offline model of state-of-the-art algorithms, "finetune" for the vanilla offline-to-online Rl algorithms and "OMDA" for OMDA enhanced algorithms. The algo_name represents the name of a specific task, including "awac", "cql", "iql, "cal_ql"
