# Multi-Agent Reinforcement Learning with Sparse Rewards

This repository contains the implementation of several Multi-Agent Reinforcement Learning (MARL) algorithms designed to handle environments with sparse rewards. The primary focus is on a novel method, "IMAP" (Implicit Model of Agent Preferences), which uses learned reward models, including rule-based and LLM-based approaches, to densify sparse environmental rewards.

The implemented algorithms are evaluated on the StarCraft Multi-Agent Challenge (SMAC) and Multi-Agent MuJoCo benchmarks.

## Features

- **MAPPO**: Implementation of Multi-Agent Proximal Policy Optimization.
- **SparseMAPPO**: MAPPO trained with the original sparse rewards.
- **Online-IPL**: Online Inverse Preference Learning to learn a reward model.
- **SL-MAPPO**: MAPPO combined with a supervised-learning reward model.
- **IMAP-GA / IMAP-LA**: Rule-based reward models with different advantage calculations.
- **IMAP-LLM**: Reward models based on Large Language Models (Gemma, Qwen).

## Environments

- **StarCraft Multi-Agent Challenge (SMAC)**: A popular benchmark for cooperative MARL.
- **Multi-Agent MuJoCo**: Continuous control tasks with multiple agents.

## Project Structure

```
.
├── envs/                   # Environment wrappers for SMAC and MuJoCo
├── graphs/                 # Saved plots from analysis notebooks
├── logs/                   # Training logs and model checkpoints
├── main.py                 # Main script for running SMAC experiments
├── main_mujoco.py          # Main script for running MuJoCo experiments
├── ipl_iql.py              # Implementation of reward models (IPL/IQL)
├── policy.py               # Actor and Critic network definitions
├── runner.py               # Handles environment interaction and data collection
├── trainer.py              # PPO and reward model training logic
├── analyze_smac.ipynb      # Jupyter notebook for analyzing SMAC results
├── analyze_mujoco.ipynb    # Jupyter notebook for analyzing MuJoCo results
├── config.py               # Configuration for experiments
└── README.md
```

## Installation

1.  **Clone the repository:**
    ```bash
    git clone <repository-url>
    cd <repository-name>
    ```

2.  **Create a Python virtual environment:**
    ```bash
    python -m venv .venv
    source .venv/bin/activate
    ```

3.  **Install dependencies:**
    This project requires Python 3.9+.
    ```bash
    pip install torch numpy pandas plotly "tyro>=0.7.0" "transformers>=4.40.0" "tensordict>=0.4.0" "tqdm"
    ```

4.  **Install SMAC:**
    Follow the instructions to install the StarCraft II game core and the SMAC environment.
    You may also need to download the SMAC maps.

5.  **Install MuJoCo:**
    If you want to run the MuJoCo experiments, you will need to install the corresponding environment.

## Usage

### Training

To run experiments on SMAC, use `main.py`. You can specify the environment name and other parameters.

**Example:**

```bash
python main.py --env_name="protoss_5_vs_5" --n_envs=8 --h_dim=512
```

The `main.py` script is configured to run the `ipl_rule_based` algorithm by default. You can modify the `algo` variable in `main.py` to switch between different algorithms (e.g., `ipl_llm_based`).

Logs and model checkpoints will be saved in the `logs/` directory.

### Evaluation and Analysis

The results of the experiments can be analyzed using the provided Jupyter notebooks:
- `analyze_smac.ipynb`: For SMAC experiments.
- `analyze_mujoco.ipynb`: For MuJoCo experiments.

These notebooks load the saved results (`.npz` files) and generate plots to compare the performance of different algorithms. Make sure to have Jupyter installed (`pip install notebook`).

```bash
jupyter notebook analyze_smac.ipynb
```
