# GMAN: Graph Mixing Additive Networks

This repository contains configuration and execution scripts to reproduce experiments from the GMAN paper across the Physionet12 (P12) and FakeNews (GossipCop) datasets.

Additionally, we include all code used for pre-processing the Crohn's Disease (CD) dataset, and for the CD experiments we ran. The CD data cannot be openly shared due to Danish privacy and data protection legislation.

## Running PhysioNet 2012 (P12) experiments

### Prerequisites
- Prepare the P12 data and splits referenced by your chosen config under `configs/P_12/`.
- Optionally, set up Weights & Biases or disable it with `--wandb False`.

### Basic multi-GPU training
Use the DDP (gloo) runner with a YAML config and optional CLI overrides:

```bash
python run_physionet_ddp_gloo.py \
  --config_path configs/P_12/biology_informed.yaml \
  --run_name dev_bio \
  --n_layers 2 \
  --hidden_channels 32 \
  --lr 1e-3 \
  --batch_size 32 \
  --val_batch_size 64 \
  --test_batch_size 64 \
  --gnan_mode per_group \
  --seed 42
```

Notes:
- World size is detected from `torch.cuda.device_count()`. If fewer than 2 GPUs are available, the script will exit.
- Checkpoints are written to `P12_checkpoints/ddp_gloo_<run_name>` and final metrics JSON to `training_results/`.

### Selecting a grouping/config
We provide several example configs under `configs/P_12/`:
- `biology_informed.yaml`
- `flat_grouping.yaml`
- `grouping_clinical_panels.yaml`
- `grouping_dynamics_frequency.yaml`
- `grouping_shock_axes.yaml`

Pick one and pass via `--config_path`.

### Ablation studies
The P12 runner supports the same ablation flags used in our paper. Toggle any of the following:

```bash
# Disable DeepSet aggregation within groups
--disable_deepset

# Remove distance embedding (uses simple mean aggregation over distances)
--disable_distance_embedding

# Use simple aggregation at the final stage (mean instead of sum)
--use_simple_aggregation

# Swap the feature processor used inside GNANs
--feature_processor_type simple_linear   # options: gnan | simple_linear | identity
```

Example ablation run:
```bash
python run_physionet_ddp_gloo.py \
  --config_path configs/P_12/biology_informed.yaml \
  --disable_deepset \
  --feature_processor_type simple_linear \
  --run_name ablate_simple_linear
```

### Prepare P12 data locally (48h cache)
Download the P12 dataset locally from [PhysioNet Challenge 2012](https://www.physionet.org/content/challenge-2012/1.0.0/). After download, apply the local formatter to cache the 48-hour subset used in our experiments:

```bash
python format_p12_data.py
```
This step writes a cached local version of the P12 48h data that the loaders consume during training.

### Cached PSV dataset mode (optional)
If you maintain a cached directory of `.psv` files (48h subset), you can point the runner to it directly:

```bash
python run_physionet_ddp_gloo.py \
  --config_path configs/P_12/biology_informed.yaml \
  --use-cached-dataset \
  --cached_dataset_dir /path/to/local_psv_cache \
  --split_pkl_path P12_data_splits/split_1.pkl \
  --predictive_label mortality
```

### Disabling W&B logging
```bash
python run_physionet_ddp_gloo.py --config_path configs/P_12/biology_informed.yaml --wandb False
```

### Outputs
- Checkpoints: `P12_checkpoints/ddp_gloo_<auto_or_run_name>/best_params_by_val_*.pth`
- Final results JSON: `training_results/results_ddp_gloo_<config_name>_seed<seed>_gpus<world_size>_<timestamp>.json`