* Note for reviewing: the cached data (Q functions, V functions, Bellman operators, policies, behavior policies, datasets, etc) mentioned below are not uploaded given the 100M file limit. We will release them in a GitHub repo later.

# Offline Model Selection: New Algorithms and Experiment Protocols- Code Instruction
## Installation
Below is our recommendation with regard to experiment setup & configuration.
+ **Python Environment**
    + See `environment.yml`. We strongly recommend readers to run our code using `python 3.10` to avoid potential incompatibility.
    + If you are using windows, we recommend that you make the installation in `wsl` for windows, as it is required that `mujoco 2.3.3` be configured in linux systems.

## Main Usage
This part serves as a general guidance for **MF.G, MF.N, MF.OFF.G, MF.OFF.N, MB.G, MB.N** main results reproductions, including **sample efficiency, misspecification, gap, convergence and sanity check** studies. 
### Notes for (Nearly) On-Policy Data Reusage
We exemplify the code usage on MF.G.
+ Create a separate folder for your full experiment.
+ Copy `mf_on_policy_gravity.py` from `env_setups`, the folder that gathered experiment configurations, and rename it to `parser.py`.
+ Copy all files from `general_code`.
+ (Optional) If you would like to leverage existing data:
    + Drag policy folder from `data/shared_policy/policies`;
    + Drag datasets folder from `data/mf_on_policy_gravity/datasets`;
    + Drag Q function cache from `data/mf_on_policy_gravity/functions`;
    + Drag V function cache from `data/mf_on_policy_gravity/v_functions`.

After all the above is done, your code & data organization should be in parallel as follows. If you would like to generate policy / dataset / function-estimates from scratch, then the corresponding parts in `offline_data` will be automatically re-cached (but according our experience this process may take up to 2-3 days).
+ `your_folder`
    + `offline_data`
        + `policies` - 0, 1, ..., 14; the evaluation policy class (which also induces a behavior policy class for **MF.G, MF.N** using epsilon-greedy mapping)
        + `datasets` - 0, 1, ..., 14; 
        + `q_functions` - 0, 1, ..., 14
        + `v_functions` - 0, 1, ..., 14
    + `ablation_observer.py`
    + ......
    + `validator.py`

And run `python3 main.py` for the full running.

### Notes for Off-Policy Data Reusage
In off-policy experiments, we trained a set of behavior policies to "maximize" the distribution shift. Besides following the same pipeline to collect code & data from **MF.OFF.G** or **MF.OFF.N** experiments, there is an **additional** to-do list for you:
+ i) Replace current `offline_dataset_collector.py, policy_trainer.py, main.py` in your folder with the ones in `exlusive/off_policy`.

### Notes for Model-Based Data Reusage
In model-based experiments, we necessitate the offline cache for bellman operators. Besides following the same pipeline to collect code & data from **MB.G** or **MB.N** experiments, there is an **additional** to-do list for you:
+ i) Drag bellman operators folder from `data/mf_on_policy_gravity/datasets`;

### TODO......

## Crucial Caveat
****
**Please always make sure that the policies & behavior policies, datasets and function estimates are of the right version to your desired experiment setup. For example, you cannot reuse MF.OFF.G data cache in MF.G experiment, otherwise it will definitely lead to wrong results.**
****