
# This State Looks Like That: Self-Interpretable Reinforcement Learning Agents using Prototype Soft Actor-Critic

This is the implementation of ProtoSAC, a novel deep RL architecture that integrates a prototype-based actor 
into the Soft Actor-Critic (SAC) algorithm, enabling intrinsic interpretability in continuous action spaces. 

## Table of Contents
1. [Installation](#installation)
2. [Usage](#usage)
3. [Available Environments](#available-environments)
4. [Model Configuration](#model-configuration)
5. [Training](#training)
6. [Saving and Loading Models](#saving-and-loading-models)
7. [Prototype Visualization](#prototype-visualization)

## Installation
To set up the project environment, you will need to install the required dependencies. You can create a Conda environment by following these steps:

1. Clone the repository:
   ```bash
   git clone https://github.com/KRLGroup/PrototypeSAC.git
   cd PrototypeSAC
   ```

2. Create the environment from the `env.yml` file:
   ```bash
   conda env create -f env.yml
   ```

3. Activate the environment:
   ```bash
   conda activate myenv
   ```

## Usage

After activating the environment, you can run the training script. The script allows you to choose the environment, set the number of training episodes, and decide whether to use the baseline model or the custom ProtoSAC model.

### Run the Script

You can execute the training process with the following command:

```bash
python main.py --environment 0 --episodes 30000 --baseline False
```

This will train the model for 30,000 episodes in the **Pendulum-v1** environment using the **ProtoSAC** model.

### Arguments

- `--environment`: Choose the environment for training:
    - `0`: **Pendulum-v1** (default)
    - `1`: **LunarLanderContinuous-v3**
    - `2`: **MountainCarContinuous-v0**
    - `3`: **HalfCheetah-v5**
    - `4`: **Humanoid-v5**
    - `5`: **Hopper-v5**
    - `6`: **CarRacing-v3**

- `--episodes`: The number of episodes to run in the selected environment (default: `30000`).

- `--baseline`: Use the baseline SAC model if set to `True`, or use ProtoSAC if set to `False` (default: `False`).

### Example Commands

1. **Train on Pendulum-v1 with the baseline model**:
   ```bash
   python train.py --environment 0 --episodes 50000 --baseline True
   ```

2. **Train on LunarLanderContinuous-v3 with ProtoSAC**:
   ```bash
   python train.py --environment 1 --episodes 100000 --baseline False
   ```

## Available Environments

The project currently supports the following environments:

1. **Pendulum-v1**: A classic continuous control task where the goal is to balance a pendulum in an upright position.
2. **LunarLanderContinuous-v3**: A continuous control task where the agent must land a lunar module safely on the moon’s surface.
3. **MountainCarContinuous-v0**: A task where the agent must drive a car up a mountain to reach a goal position.
4. **HalfCheetah-v5**: Control a simulated 2D cheetah robot to run forward efficiently.
5. **Humanoid-v5**: Control a full-body humanoid robot to walk or run; highly complex.
6. **Hopper-v5**: Control a 2D one-legged robot to hop forward without falling.
7. **CarRacing-v3**: Vision-based racing task where the agent drives a car on random tracks.

You can select any of these environments by specifying the `--environment` argument.

## Model Configuration

The script uses the **ProtoSAC** algorithm by default, and the **SAC (Soft Actor-Critic)** algorithm  as an alternative.

## Training

The training is handled by the `model.learn()` method. By default, the training runs for `30000` episodes, but this can be adjusted with the `--episodes` argument. During training, the model's performance is monitored, and video recordings of the agent’s behavior are saved in the `videos/` directory.

## Saving and Loading Models

At the end of the training process, the model is saved to the `models/` directory. The model can be loaded for further use as follows:

### Save Model

```python
model.save(f"models/{name_env}")
```

### Load Model

```python
model = SAC.load(f"models/{name_env}")
```

You can load the model later for evaluation or continued training.

## Prototype Visualization

In this project, you can visualize prototypes during the evaluation phase. The evaluation script uses the `ProtoSAC` model to visualize the "prototypes" learned by the model during training.

### Evaluation and Visualization:

The evaluation script evaluates the model on a specified environment and visualizes the learned prototypes, action distributions, and saves these visualizations as images and videos. You can use this method to see how the model behaves and understand the representation of the learned prototypes.

To evaluate the model and visualize the prototypes, run:

```bash
python protosac_test.py --environment "Pendulum-v1" --model_path "path_to_trained_model" --episodes 50 --save_dir "evaluation_results"
```

This will:
- Evaluate the model stored at `"path_to_trained_model"` on the `"Pendulum-v1"` environment.
- Run for 50 episodes.
- Save the evaluation results, prototype images, and video in the `"evaluation_results"` directory.

---
