### Description

Code for Behavioral Priors and Dynamics Models: ImprovingPerformance and Domain Transfer in Offline RL.


### Instructions

To train advantage-weighted behavioral prior (done before running the offline RL portion), use the script `train_prior.py`:


```
usage: train_prior.py [-h] [--n_epoch_qfn N_EPOCH_QFN]
                      [--n_epoch_prior N_EPOCH_PRIOR] [--save_path SAVE_PATH]
                      env_name

positional arguments:
  env_name

optional arguments:
  -h, --help            show this help message and exit
  --n_epoch_qfn N_EPOCH_QFN
  --n_epoch_prior N_EPOCH_PRIOR
  --save_path SAVE_PATH
```

For example to train and save the behavioral prior for `hopper-medium-expert-v0`, we run:
```
python3 train_prior.py 'hopper-medium-expert-v0' --n_epoch_qfn 10 --n_epoch_prior 30 --save_path 'exp/hopper_medium_expert_prior/'
```

Afterwards, to run MABE, run `mbpo.py`:

```
usage: mbpo.py [-h] [--env-name ENV_NAME] [--seed N] [--experiment EXPERIMENT]
               [--load LOAD] [--load_prior LOAD_PRIOR] [--coeff COEFF]
               [--rollout ROLLOUT] [--save_video] [--save_models]
               [--data_path DATA_PATH] [--gamma G] [--tau G] [--alpha G]
               [--policy POLICY] [--target_update_interval N]
               [--automatic_entropy_tuning G] [--hidden_size N] [--lr G]
               [--kl KL] [--num_networks E] [--num_elites E]
               [--pred_hidden_size E] [--reward_size E] [--replay_size N]
               [--model_retain_epochs A] [--model_train_freq A]
               [--rollout_batch_size A] [--epoch_length A]
               [--rollout_min_epoch A] [--rollout_max_epoch A]
               [--rollout_min_length A] [--rollout_max_length A]
               [--num_epoch A] [--min_pool_size A] [--real_ratio A]
               [--train_every_n_steps A] [--num_train_repeat A]
               [--max_train_repeat_per_step A] [--policy_train_batch_size A]
               [--init_exploration_steps A] [--model_type A] [--cuda]
```

For example, to run our `hopper-medium-expert-v0` runs, we use:

```
python mbpo.py --env-name hopper-medium-expert-v0 --num_epoch 100 --kl 0.1 --model_type 'tf' --experiment "hopper-med-expert" --load_prior '../../exp/hopper_medium_expert_prior/'
```

