# MARL-SmartGrid Benchmark 
This codebase accompanies our TMLR 2025 submission, in which we introduce **SmartGridScenario**, a custom VMAS environment for multi-agent demand-response in an electrical micro-grid. It provides scripts to **train** and **evaluate** a suite of off-the-shelf MARL algorithms (MAPPO, MADDPG, MASAC, …) under identical conditions, enabling reproducible comparisons and rich per-agent logging.

---

## Repository structure

.
├── run.py # CLI for training / evaluation (BenchMarl wrapper)
├── eval.py # One-liner: restore latest checkpoint & run evaluation
├── smart_grid.py # Definition of the SmartGridScenario
├── data/ # *.npy traces used by the scenario
└── configs/ # Optional YAML overrides for BenchMarl defaults


### Data directory

data/
├─ b1/data.npy # shape (3, T): demand, PV generation, price
├─ b2/data.npy
└─ b3/data.npy


Each episode samples a random start index to improve generalization.

---

---

## Prerequisites

0. **Clone BenchMarl**  
   This code depends on BenchMarl being available in your working directory:
   ```bash
   git clone https://github.com/oxwhirl/benchmarl.git

Install dependencies via Conda
Use the provided environment.yml to reproduce the exact setup:

conda env create -f requirement.yml
conda activate benchmarl


## Quick start

1. **Create the Conda environment**  
   ```bash
   conda env create -f environment.yml
   conda activate marl-sg
2. **Train an algorithm**
python run.py --algorithm mappo --task customenv
This creates:
mappo_1/
 └─ <run-name>/
     ├─ checkpoints/
     ├─ tensorboard/
     └─ configs/

3. **Evaluate the latest checkpoint**

python eval.py --folder mappo_1

This will:

Locate the newest .pt under mappo_1/*/checkpoints/

Restore it in evaluation mode

Write per-agent CSV logs to eval/mappo_1/

Customize hyper-parameters
Drop a YAML file in configs/, for example:


# configs/experiment.yaml
max_n_frames: 1_000_000
num_envs: 16
evaluation_interval: 5_000

run.py will merge it automatically.

Supported algorithms
Flag (--algorithm)	BenchMarl config class

mappo	            MappoConfig
maddpg	            MaddpgConfig
masac	            MasacConfig
isac	            IsacConfig

Extend the lookup table in run.py to add more.

# SmartGridScenario as a Markov Decision Process

MDP ⟨S, A, T, R, γ⟩

S = {
  s = (
    charge ∈ [0, C_max],        # current battery state‐of‐charge
    demand ∈ ℝ₊,                # instantaneous power demand
    price ∈ ℝ₊,                 # current grid electricity price
    backlog ∈ ℝ₊                # postponed (unserved) demand
  )
}

A = {
  a = (
    grid_draw ∈ [−1, 1],        # fraction of max draw from grid
    batt_draw ∈ [−1, 1]         # fraction of max draw from battery
  )
}

T(s, a → s′):
  # Update battery:
  charge′ = clip(charge − batt_draw · P_max, 0, C_max)
  # Determine delivered power:
  delivered = grid_draw · P_max + batt_draw · P_max
  # Update backlog:
  served = min(demand + backlog, delivered)
  backlog′ = max(demand + backlog − delivered, 0)
  # Advance time‐series index for demand & price:
  demand′, price′ = next_trace_values()
  s′ = (charge′, demand′, price′, backlog′)

R(s, a, s′):
  cost = price · max(grid_draw · P_max, 0)
  grid_frac = grid_draw · P_max / (demand + ε)
  R = −cost − λ·grid_frac + ν·backlog′

Episode length = 80 steps

