# Learning Reward Functions for Cooperative Resilience in Multi-Agent Systems  

## Supplementary Code  

This repository accompanies the paper **"Learning Reward Functions for Cooperative Resilience in Multi-Agent Systems" (ICLR 2026)**, which investigates how reward function design impacts **cooperative resilience** in Multi-Agent Reinforcement Learning (MARL).  

In dynamic and failure-prone environments, agents must not only optimize individual objectives but also ensure the **collective system remains functional under disruptions**. We define cooperative resilience as the ability of agents to **anticipate, resist, recover, and adapt** in the presence of external shocks. This repository provides tools and experiments to study and improve this emergent property through IRL-guided reward learning.  

We introduce a novel **reward learning framework** that learns reward functions from **ranked trajectories**—evaluated via a cooperative resilience score. Agents are then trained in **social dilemma environments** using:  

* **(i)** Traditional individual reward functions  
* **(ii)** Inferred rewards aligned with cooperative resilience  
* **(iii)** Hybrid rewards combining both  

The reward inference is performed using two preference-based IRL algorithms across three types of parameterizations:  

* **Handcrafted features**  
* **Linear reward models**  
* **Neural networks**  

Our results show that **resilience-guided rewards** lead to improved robustness and coordination, helping agents avoid catastrophic outcomes (e.g., resource depletion), without sacrificing individual performance. We further extend the experiments to **larger 16×16 environments with 4 agents and multiple resource clusters**, and evaluate agents under **three disruption protocols** (resource removal, regeneration slowdown, and agent perturbation).  

---

## 📁 Repository Structure  

* `src/`: Core source code (agents, enviroment, IRL models, metrics, trajectories, utils)
* `data/`: Sample trajectories and inferred reward models (learning)
* `models/`: Trained PPO and QMIX agents grouped by strategy (baseline, hybrid, resilience) also include the best (best)
* `scripts/`: Training and evaluation scripts
* `visualization/`: Tools for plotting heatmaps, trajectory maps, and feature visualizations
* `results/`: Saved results from experimental runs
* `resilience/`: Metrics to evaluate cooperative resilience


## 🚀 Getting Started  

1. **Clone the repository:**  

```bash
git clone https://github.com/anon-researcher-ICLR-2026/iclr2026-supplementary-code.git

2. **Create a virtual environment and install dependencies**
```bash
python -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
# source venv/bin/activate

pip install -r requirements.txt
```

> **Note:** Keep `venv/`, large models, and local artifacts out of git (see `.gitignore`).

---

## 📈 Generating Trajectories & Resilience Scores

You can generate your own agent-environment trajectories using:

```bash
python scripts/generate_random_scored_trajectories.py
```

Our experiments rely on **precomputed resilience-scored trajectories**, located in: https://drive.google.com/file/d/1Y4hkSGUbrzo-NVPo5w8Tk32KYBCAVhuZ/view?usp=sharing 

The metric used to **evaluate and rank trajectories** by cooperative resilience is implemented in:

```bash
resilience/resilience_metrics.py
```

This metric combines fairness, sustainability, and disruption recovery into a unified score.

---

## ⚙️ Training IRL Models

Train preference-based IRL with different parameterizations:
```bash
python scripts/learning_irl_handcrafted_model.py
python scripts/learning_irl_linear_model.py
python scripts/learning_irl_nn_model.py
```
Outputs are stored under `data/learning/` (separated by resilience-only vs. hybrid setups and MPL/PPL variants).

---

## 🧪 Training Agents with Inferred Rewards (PPO & QMIX)

Train PPO with inferred rewards (examples):
```bash
python scripts/train_ppo_with_irl_handcrafted_reward.py
python scripts/train_ppo_with_irl_linear_reward.py
python scripts/train_ppo_with_irl_nn_reward.py
```

Train QMIX baseline:

> **Note:** QMIX training was performed externally using the **PyMARL** framework.  
> In this repository we provide the trained models —including the **agent network**, the **mixer**,  
> and the **optimizer state**— so they can be directly loaded and evaluated here.  
> Therefore, the scripts in this repo focus on **evaluation**, not training, for QMIX.


Models are saved under `models/` (organized by baseline / resilience / hybrid, and by MPL/PPL).

---

## 📊 Evaluation Protocols

**8×8 (two agents)** and **16×16 (four agents, three trees)** environments with disruption protocols:

- **Resource removal** (apple deletion at a fixed timestep/ratio)
- **Regeneration slowdown** (temporary drop in regrowth rate)
- **Agent perturbation** (randomized motion period)

Example evaluation scripts:
```bash
# 8×8 PPO or QMIX (examples)
python scripts/disruption_protocol_evaluation_qmix.py
python scripts/disruption_protocol_evaluation_ppo.py

# 16×16 PPO (four agents)
python scripts/disruption_protocol_evaluation_large.py
```

These scripts log:
- Cooperative resilience
- Episode length
- Cumulative consumption
- Last-apple events
- Spatial patterns (optionally)

## 📊 Visualizations

Generate heatmaps from trained agents:

```bash
python visualization/plot_heatmaps.py
```

Alternatively, visualize agent behavior through an animation that compares a trained policy against a random baseline:

```bash
python visualization/visualization.py
```

You can also watch a demo video here: https://youtu.be/4AdLDhyKqKY.

---


## 📁 Reproducibility Notes

- **Code & configs:** this repository includes training and evaluation scripts reproducing the main experiments.
- **Pretrained agents:** provided under `models/` for quick replication (baseline, resilience, hybrid, best, example).
- **Trajectories:** scripts to regenerate or score; precomputed assets may be referenced via an anonymized link.
- **Appendix:** the paper appendix includes full implementation details, hyperparameters, and evaluation settings.
- **Anonymized materials:** an anonymized GitHub and a zipped copy of the source are provided as supplementary files.

> This repository follows the ICLR Reproducibility Statement guidelines. See “Reproducibility Statement” in the main paper for exact pointers to sections, appendix items, and supplemental assets.

---

## 📩 Contact

**To be provided upon de-anonymization.**
