# WMCal: Weighted Multicalibration

Code for ICML 2026 submission on weighted multicalibration. This repository contains the implementation and experimental evaluation of the weighted multicalibration framework.

## Overview

This codebase supports two main decision-making scenarios:

- **Value Maximization (Top-1)**: Single item selection from predicted utilities
- **Graph Matching**: Bipartite matching on complete graphs using predicted edge weights

## Installation

### Requirements

- Python 3.13+
- [uv](https://docs.astral.sh/uv/) package manager

### Setup

```bash
# Install dependencies
uv sync
```

## Repository Structure

```
wmcal-submission/
├── main.py                    # Run single experiments
├── sweep.py                   # Run parameter sweeps
├── cleanup_logs.py            # Utility to clean experiment logs
├── plot.ipynb                 # Generate figures for paper
├── configs/
│   ├── experiments/           # Individual experiment configs
│   │   ├── default.yaml       # Top-1 value maximization
│   │   └── default-full-graph.yaml  # Graph matching
│   └── sweep/                 # Parameter sweep configs
│       ├── default.yaml       # Sweep for Top-1 (400 experiments)
│       └── default-full-graph.yaml  # Sweep for Graph (100 experiments)
├── wmcal/                     # Main package
│   ├── calibrators/           # Calibration algorithms
│   ├── predictors/            # Prediction models
│   ├── data/                  # Dataset generation
│   ├── experiments/           # Experiment runner
│   ├── plots/                 # Plotting utilities
│   └── utils/                 # Utility functions
├── figures/                   # Generated plots (PDF)
└── .logs/                     # Experiment results (500 experiments)
```

## Quick Start

### Running Experiments

**Single experiment:**
```bash
uv run python main.py --cfg configs/experiments/default.yaml
```

**Parameter sweep:**
```bash
uv run python sweep.py --cfg configs/sweep/default.yaml
```

Optional flags:
- `--redo`: Rerun experiment even if already completed
- `--debug`: Enable debug logging

**Environment variables for sweeps:**
- `WORKERS=8`: Override number of parallel workers
- `DRY_RUN=true`: Show what would run without executing

### Generating Figures

Open and run [plot.ipynb](plot.ipynb) to generate all figures used in the paper:

The notebook generates the following figures in `figures/`:
- `value_max_utility.pdf` - Utility metrics for Top-1 value maximization
- `value_max_mse.pdf` - MSE convergence for Top-1 (faceted by output dimension)
- `graph_utility.pdf` - Utility metrics for graph matching
- `graph_mse.pdf` - MSE convergence for graph matching

## Experiment Configurations

### Sweep Configurations

#### Top-1 Value Maximization (`configs/sweep/default.yaml`)

- **Problem**: Synthetic value maximization (single item selection)
- **Parameters swept**:
  - `batch_size`: [16, 64, 256, 1028]
  - `tol`: [0.125, 0.0625, 0.03125, 0.015625, 0.0078125]
  - `output_dim`: [4, 16, 64, 256]
- **Seeds**: 5 (42-46)
- **Total experiments**: 400

#### Graph Matching (`configs/sweep/default-full-graph.yaml`)

- **Problem**: Bipartite matching on synthetic complete graphs
- **Parameters swept**:
  - `batch_size`: [16, 64, 256, 1028]
  - `tol`: [0.125, 0.0625, 0.03125, 0.015625, 0.0078125]
- **Output dimension**: 45 (fixed for 10-node complete bipartite graph)
- **Seeds**: 5 (42-46)
- **Total experiments**: 100

### Configuration Format

Experiments use YAML configs with the following structure:

```yaml
type: dummy
seed: 42
predictor_config:
  type: simple_net
  input_dim: 10
  output_dim: 75
  epochs: 1000
  lr: 0.01
dataset_config:
  type: synthetic
  test_size: 4096
  predictor_size: 20000
  calibrator_size: null
calibrator_config:
  type: grid_boost
  batch_size: 1024
  tol: 0.01
  grid_resolution: 0.25
  max_iter: 1024
```

See individual experiment configs in `configs/experiments/` for complete details.

## Key Components

### Calibrators

- **GridBoost** (`wmcal/calibrators/grid_boost.py`): Iterative gradient-boosting style calibration over discretized decision space
  - Supports both Top-K and graph matching decision functions
  - Uses Sinkhorn iterations with configurable tolerance

### Predictors

- **SimpleNet** (`wmcal/predictors/simple_net.py`): Neural network predictor for utilities/edge weights
  - Configurable architecture (input/output dimensions, learning rate, epochs)
  - Trained on synthetic polynomial utility functions

### Decision Functions

- **Top-K Selection**: `wmcal/data/decision_functions/top_k.py`
- **Graph Matching**: `wmcal/data/decision_functions/graph.py`

### Datasets

- **Synthetic**: Polynomial utility functions (`wmcal/data/datasets/synthetic/`)
- **Synthetic Graph**: Random edge weights for bipartite graphs (`wmcal/data/datasets/synthetic_graph/`)

## Experiment Results

All experiment results are stored in `.logs/{experiment_id}/`:
- `metrics.jsonl`: Logged metrics (MSE, MVS) over training iterations
- `config.yaml`: Full experiment configuration
- `done`: Marker file indicating completion

The submission includes logs for all 500 experiments (400 Top-1 + 100 Graph Matching).