## Training and Evaluation Instructions

1.  **Configure and activate the Anaconda virtual environment `storm_var`**.

      - To configure the environment:

        ```bash
        conda env create -f environment.yml
        ```

      - To activate the environment:

        ```bash
        conda activate storm_var
        ```

-----

2.  **Train the agent.**

    ```shell
    bash train.sh
    ```

    The `train.sh` file controls the environment and the run name for a training process.

    ```shell
    export CUDA_VISIBLE_DEVICES=1
    env_name=ALE/Assault-v5
    seed=1
    entropy_coef=3e-4
    python -u train.py \
        -n "${env_name}/adaptive_weight_entropy${entropy_coef}/${seed}" \
        -seed "${seed}" \
        -config_path "config_files/STORM.yaml" \
        -env_name "${env_name}" \
        -trajectory_path "D_TRAJ/${env_name}.pkl" \
        -entropy_coef "${entropy_coef}"
    ```

      - The `env_name` on the first line can be any Atari game, which you can find [here](https://gymnasium.farama.org/environments/atari/).
      - The `-n` option is the name for the **TensorBoard** logger and checkpoint folder. You can change it to your preference, but we recommend keeping the environment's name first. The **TensorBoard** logging folder is `runs`, and the checkpoint folder is `ckpt`.
      - The `-seed` parameter controls the running seed during training. We evaluated our method using 5 seeds.
      - The `-config_path` points to a YAML file that controls the model's hyperparameters. The configuration in `config_files/STORM.yaml` is the same as in our paper.

    **Optional Parameters:**

      - The `-trajectory_path` is only useful when the `UseDemonstration` option in the YAML file is set to `True` (it's `False` by default). This corresponds to the experiments in the **STORM** paper where environments like **Freeway** are trained with expert trajectories. The trajectories are placed in the `D_TRAJ` folder.
      - The `-value_scalar` parameter is `None` by default, which means the value alignment weight is adaptive. If you set this parameter, the value alignment weight will remain constant.

-----

3.  **Evaluate the agent.** The evaluation results will be in a CSV file located in the `eval_result` folder.

    ```shell
    bash eval.sh
    ```

    The `eval.sh` file controls the environment and the run name when testing an agent.

    ```shell
    export CUDA_VISIBLE_DEVICES=1
    env_name=ALE/Alien-v5
    run_name=ALE/Alien-v5/adaptive_weight_entropy3e-4/1
    seed=1
    python -u eval.py \
        -env_name "${env_name}" \
        -run_name "${run_name}" \
        -seed "${seed}" \
        -config_path "config_files/STORM.yaml"
    ```

    The `-run_name` option is the same as the `-n` option in `train.sh`. It should be kept the same as in the training script.