﻿# Experiment Guide

## Prerequisites
- Python 3.11 environment.
- Install dependencies once: `pip install numpy networkx scipy`.
- Run commands from the repository root (the directory containing this file).

Environment variables control many runs. In PowerShell use `$env:VAR = "value"`; in bash use `export VAR=value`.

## Experiments

### `lambda_experiment.py`
- **Goal**: sweep total review load per paper (`lambda_per_paper`) while comparing direct, ambiguous, and promising review pipelines.
- **Run**: `python lambda_experiment.py`.
- **Key options**:
  - `$env:ETA_ORIG` (default `0.4`) selects the two-stage split used for the ambiguous/promising baselines.
  - The script always evaluates the hard-coded list `[5, 6, 7, ..., 18]` for `lambda_per_paper`.
- **Outputs**: CSV `lambda_experiment_results_YYYYMMDD_HHMMSS.csv` with mean/STD for accuracy, F1, precision/recall, BP error, calibration, KL, and JS. A text summary prints to stdout.

### `paper_num_lambda_per_paper_experiment.py`
- **Goal**: study scalability as the number of papers and reviewers grows while keeping `lambda_per_paper` fixed.
- **Run**: `python paper_num_lambda_per_paper_experiment.py`.
- **Key options**:
  - `$env:TARGET_LAMBDA` (default `8`) fixes the per-paper review load (and per reviewer load, because papers equal reviewers).
  - `$env:ETA_ORIG` (default `0.4`) sets the two-stage split for ambiguous/promising runs.
- **Outputs**: CSV `paper_num_lambda_per_paper_experiment_results_YYYYMMDD_HHMMSS.csv` summarising all metrics; the script also prints a per-paper-count summary table.

### `eta_experiment.py`
- **Goal**: evaluate multiple lambda-allocation strategies across eta values in `[0.1, 0.9]` under homogeneous reviewer quality.
- **Run**: `python eta_experiment.py` (add `--quick` for a fast sanity pass).
- **Key options**:
  - `$env:TARGET_LAMBDA` (default `8`) defines the total review load per paper.
  - `$env:FRAC_GRID` (default `0.25,0.5,0.75`) enables extra `constant_stage2_frac_xx` strategies.
  - `$env:LAMBDA2_FRAC` adds a legacy single fractional strategy when needed.
  - `$env:LAMBDA2_MAX` (default `18`) caps stage-2 intensity for safety.
  - `$env:QUICK=1` mirrors `--quick` (fewer eta values, 5 trials, strategies limited to `original`, `fixed_stage1_4`, `fixed_stage1_3`).
- **Outputs**: CSV `eta_multi_strategy_results_YYYYMMDD_HHMMSS.csv`; stdout prints per-strategy analysis and overall best configurations.

### `eta_mixture_experiment.py`
- **Goal**: repeat the eta/strategy sweep when reviewer reliabilities follow a spammer-hammer prior that BP only observes as a prior distribution.
- **Run**: `python eta_mixture_experiment.py` (supports `--quick`).
- **Key options**:
  - `$env:TARGET_LAMBDA` (default `8`).
  - `$env:PAPER_NUM` and `$env:REVIEWER_NUM` (defaults `100`).
  - `$env:ETA_GRID` (comma list, default `0.1,0.2,...,0.9`).
  - `$env:TRIALS` (default `20`).
  - `$env:Q_EXP`, `$env:FRAC_EXP`, `$env:Q_BASE` define the mixture prior (defaults `0.9`, `0.5`, `0.5`).
  - `$env:FRAC_GRID`, `$env:LAMBDA2_FRAC`, `$env:STRATEGIES` behave as in `eta_experiment.py`.
  - `$env:QUICK=1` shrinks the eta grid to `{0.4,0.6}` and trials to `5`.
- **Outputs**: CSV `eta_mixture_strategy_results_YYYYMMDD_HHMMSS.csv` plus a console summary (best balanced/ambiguous/promising configurations and direct baseline).


## Tips
- Leave the built-in seeding in place for reproducible comparisons; only override seeds if you have a custom scenario in mind.
- Generated CSVs land in the repo root; move them under `results/` for long-term storage to keep the workspace tidy.
- Plotting helpers such as `plot_strategy_calibration_curves.py` can ingest the CSVs above to create figures once experiments finish.
