### Prerequisites

Ensure you are using a **Python 3.8** environment. Before you proceed, set up the required environment by executing the following command:

```python
pip install --no-deps -r requirements.txt
```

This command installs the necessary dependencies from the `requirements.txt` file.

------

### 1. `run_reverse.py`

- Purpose:
  - This script is responsible for generating the `anchor_seeking trajectory`.
  - Upon executing the command, a `reverse_imagination.pkl` file will be saved in the `result` directory of the current location.
  - The implementation is set up end-to-end. As a result, once the reverse dynamics are trained, a .pkl file for the anchor seeking trajectory is immediately generated.

------

### 2. `run_mopo.py`

- **Purpose:**
  - This script provides an end-to-end implement to:
    1. Train the dynamics model.
    2. Train the anchor seeking policy.
    3. Test the policy using the COCOA framework.
- **Important Notes:**
  - For training the anchor seeking policy, the data file generated from `run_reverse.py` is necessary. Make sure to load this data using the `load_reverse_imagination_path` argument.
  - If you have pre-trained models for dynamics or anchor seeking and wish to use them, you can specify their paths using `load_dynamics_path`, `load_policy_path`, and `load_anchor_seeker_path` arguments. This allows you to quickly conduct experiments.

### Command example

- you can add following environment variable for convenience (e.g. speed, ignoring warning etc.)

```python
OMP_NUM_THREADS=1 OPENBLAS_NUM_THREADS=1 MKL_NUM_THREADS=1 VECLIB_MAXIMUM_THREADS=1 NUMEXPR_NUM_THREADS=1 D4RL_SUPPRESS_IMPORT_ERROR=1
```

1. **run_reverse.py**

python run_example/run_reverse.py --dynamics_hidden_dims 200 200 200 200  --reverse_policy_mode divergent --load_dynamics_path dummy --task halfcheetah-medium-expert-v2   --rollout_epoch 100 --rollout_length 5 --scale_coef 0.8 --noise_coef 0.1 --seed 0'

2. **run_mopo.py**

python run_example/run_mopo.py --dynamics_hidden_dims 200 200 200 200 --hidden_dims 100 100 --tr_hidden_dims 64 64 --anchor_seeker_hidden_dims 100 100 --embedding_dim 4 --policy_train True  --asp_which both --actor_horizon_len 1 --critic_horizon_len 1 --task halfcheetah-medium-expert-v2 --penalty_coef 2.5 --rollout_length 5  --seed 0 --load_reverse_imagination_path ./your/directory.pkl'