# Overview

The codebase consists of three folders. `generate_metaworld_demos` for generating Meta-World expert demonstrations, `timerewarder` for training the model, and `RL` for evaluating the reward model on down-stream RL tasks.

# Environment Setup

- Install [Mujoco](http://www.mujoco.org/) based on the instructions given [here](https://github.com/facebookresearch/drqv2).

- Install the following libraries:
  ```shell
  sudo apt update
  sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3
  ```

- Install other dependencies:

  ```shell
  conda create -n tr_env python=3.7
  conda activate tr_env
  pip install -r requirements.txt
  ```

  For problems that may occur when installing `apex`, this version can work:
  ```shell
  git clone -b 22.04-dev https://github.com/NVIDIA/apex.git
  pip install -v --no-cache-dir ./
  ```

# Data Preparation
The training data of TimeRewarder uses `mp4` format videos. We should put them into `config.DATA.ROOT` in `timerewarder/config/config.yaml`. We also need `txt` format files for the video lists, which look like:

```Shell
    $ head -n 2 [config.DATA.TRAIN_FILE]
    a.mp4
    b.mp4

    $ head -n 2 [config.DATA.VAL_FILE]
    c.mp4
    d.mp4
```

You can  generate Meta-World expert demonstrations through `generate_metaworld_demos/generate_demo.py`.

# Running the code

An example for training TimeRewarder is:

```shell
cd timerewarder
python -m torch.distributed.launch --nproc_per_node=8 main.py -cfg "configs/config.yaml" \
        --output "./output"
```

Then you can use the trained reward model for downstream RL with:

```shell
cd RL
python train.py suite/metaworld_task=[task_name] seed=0 suite.num_train_frames=200000 cost_encoder='timerewarder' cost_encoder_ckpt=[timerewarder_ckpt]

```

