# SPARe SimGrid Simulators

## Platform Generation
`platform.xml` can be regenerated with the provided script:

```sh
cd DES
python3 generate_platform.py
```

This overwrites `platform.xml` with the current script settings.

### JSON Config (generate_platform.py)
Generate an example config:

```sh
python3 generate_platform.py --example > config.json
```

Use a custom config and output path:

```sh
python3 generate_platform.py --config config.json --out platform.xml
```

Top-level JSON keys:
- `workers`: explicit list of workers. Each worker needs `id`, `speed`, `disk_read_bw`, `disk_write_bw`, and optional `levels`.
- `workers_range`: list of templates expanded by numeric ranges.
- `workers_template`: single template or list of templates (same format as `workers_range`).
- `overrides`: list of per-worker overrides using glob patterns (applied after expansion).
- `links`: map of link name to `{bandwidth, latency}`.
- `routes`: explicit routes (when present, routing rules are ignored).
- `routing`: routing rules used when `routes` is not provided.

Worker templates (`workers_range` or `workers_template`) support:
- `ranges`: map of variable -> list or `"start..end"` range.
- `id`, `speed`, `disk_read_bw`, `disk_write_bw`: format strings using `{var}`.
- Optional `levels`: map of level name -> `{var}` or literal value.

Example template:

```json
{
  "workers_range": [
    {
      "ranges": {"i": "0..3", "rack": [0, 1]},
      "id": "worker{i}_r{rack}",
      "speed": "200Gf",
      "disk_read_bw": "2GBps",
      "disk_write_bw": "1GBps",
      "levels": {"node": "{i}", "rack": "{rack}", "cluster": 0}
    }
  ]
}
```

Overrides:

```json
{
  "overrides": [
    {"id": "worker0*", "speed": "250Gf", "disk_read_bw": "3GBps"}
  ]
}
```

Routing rules:
- `routes`: list of `{src, dst, link}` or `{src, dst, links: [..]}`.
- `routing.levels`: ordered list of `{name, link}`; the first matching level between src/dst selects the link.
- `routing.default_link`: fallback when no level matches.
- `routing.intra_link` / `routing.inter_link`: uses `group` on workers to choose intra vs inter.
- `routing.pairs`: list of `{src, dst, link}` or `{src, dst, links}`; for multiple explicit routes use top-level `routes`.

### Platform Description (platform.xml)
The script writes a SimGrid platform with:
- `<platform version="4.1">` and a single `<zone id="AS0" routing="Full">`.
- One `<host>` per worker with `id` and `speed`, plus a `<disk id="disk0">` child.
- One `<link>` per entry in `links` (bandwidth + latency).
- `<route>` entries between worker pairs with `<link_ctn id="...">` per hop.

If `routes` is provided, those are used directly. Otherwise routing rules select links
based on `levels`, `intra_link`/`inter_link`, or `default_link`.

This repo contains three SimGrid-based simulators for distributed training:
- `src/DP_DES.cpp`: vanilla data parallelism (single replica).
- `src/DPR_DES.cpp`: data parallelism with replication.
- `src/SPARe_DES.cpp`: order-aware replication with resequencing and stack depth control.

## Build
Run from `DES/` so the default `platform.xml` resolves correctly:

```sh
cd DES
make all
```

Executables are written to `bin/`.

## Run
Example (SPARe):

```sh
mkdir -p logs
./bin/SPARe_test --steps=100 \
  --ckpt=5 \
  --fail-dist=exponential \
  --exp-rate=0.005 \
  --recover=10 \
  --workers=9 \
  --compute-jitter=0.1 \
  --replicate_level=3 \
  2>&1 | tee logs/SPARe.log
```

## Parameters
Common to DP/DPR/SPARe:
- `--steps=INT`: total training steps.
- `--ckpt=INT`: checkpoint interval (steps).
- `--compute=FLOPS`: forward/backward compute cost.
- `--control=FLOPS`: control-phase compute cost.
- `--model=BYTES`: checkpoint size.
- `--grad=BYTES`: gradient size for allreduce.
- `--allreduce-fail-scale=FLOAT`: scale factor for allreduce size when a failure happens before/during allreduce.
- `--data=BYTES`: per-step data read size.
- `--cpu-scale=FLOAT`: global CPU scale factor.
- `--fail-prob=FLOAT`: Bernoulli failure probability.
- `--fail-dist=STR`: `bernoulli`, `exponential`, or `weibull`.
- `--exp-rate=FLOAT`: exponential rate (lambda).
- `--weibull-shape=FLOAT`: Weibull shape (k).
- `--weibull-scale=FLOAT`: Weibull scale (lambda).
- `--recover=FLOAT`: full system recovery time (seconds).
- `--workers=INT`: number of workers to launch.
- `--compute-jitter=FLOAT`: per-step compute jitter (stddev).
- `--seed=UINT`: RNG seed (0 uses wall-clock time).
- `--platform=STRING`: platform path (.xml, see [Platform Generation](#platform-generation)).

DPR/SPARe only:
- `--decision=FLOPS`: controller decision compute cost.
- `--partial-recover-time=FLOAT`: partial recovery time (seconds).
- `--replicate_level=INT`: replication level (>=1).

## Constraints (DPR/SPARe)
- `replicate_level < 29`.
- `WORLD_SIZE > 2 * RULER_TABLE[replicate_level - 1].back() - 1`.

## Logging
- Logs are emitted after each simulated event completes (compute, I/O, communication, control).
- SPARe logs include stack depth, controller phase decisions, and resequencing changes.
