# Analysis and Visualization Scripts

This repository provides two standalone Python scripts used to **summarize**, **aggregate**, and **visualize** watermark stability statistics across users, random seeds, datasets, and models **(Figure.2, 3, 4, and 6 in the paper)**.
Both scripts rely only on CSV/JSON artifacts generated by upstream experiments.

---

## 1. Per-user seed stability summarization

### Script

`summary_seed_stability.py`

### Purpose

This script processes raw `water*.json` probe outputs and produces a **per-user summary** of `second_match` stability across random seeds.

For each combination of:

* dataset
* model
* configuration (`Txxx_Ux_Px` extracted from `wm_version`)
* user ID

it computes:

* the mean `second_match` rate across seeds
* the variance of that rate across seeds

Special handling is applied to news-style datasets with chunked inputs (e.g. CNN-style datasets), where multiple chunks belonging to the same document row are aggregated using a logical OR rule.

### Input

* A directory tree containing probe outputs named `water*.json`
* Dataset and model are inferred **only** from the directory structure:

```
.../dataset/processed/<dataset>/probe_outputs/<model>/water*.json
```

### Output

A single CSV file summarizing per-user seed stability:

```
GLOBAL_user_seed_second_match_summary.csv
```

with columns:

* `dataset`
* `model`
* `config`
* `user_id`
* `n_seeds`
* `mean_second_match`
* `var_second_match`

### Usage

```bash
python summary_seed_stability.py \
  --root /path/to/experiment_root \
  --out_csv GLOBAL_user_seed_second_match_summary.csv
```

### Key options

| Argument    | Description                                           | Default                                     |
| ----------- | ----------------------------------------------------- | ------------------------------------------- |
| `--root`    | Root directory searched recursively for `water*.json` | `.`                                         |
| `--out_csv` | Path to output CSV                                    | `GLOBAL_user_seed_second_match_summary.csv` |

---

## 2. Aggregation and plotting across users (For the first three experiments in the paper, Experiment 4.2, 4,3 and 4.4, for baseline, Experiment 4.6, refer to the baseline folder)

This repository provides three plotting scripts with the same input interface but different aggregation logic and visualization styles.

### plot_xp1.py 
**(Experiment 4.2, Figure.2)**

**Purpose.** Produces a single-column **1×3 multi-panel figure** (one panel per dataset), showing regurgitation **rate** as a function of the number of watermarked documents (P).


**What it does.**

* Reads per-user CSV summaries.
* Extracts P from `config`.
* Aggregates across users for each `(dataset, model, P)`.
* Computes t-based confidence intervals on the rate.

**Typical output.**

* One 1×3 figure (PNG/PDF) in `analysis_out/plots/`.

**Purpose.** Produces a single combined panel aggregating across datasets and models, showing regurgitation behavior as a function of the number of watermarked documents (P).

**What it does.**

* Reads per-user CSV summaries.
* Extracts P from `config`.
* Aggregates across users for each `(dataset, model, P)`.
* Supports two Y-axis modes:

  * `count`: expected number of regurgitated replies.
  * `rate`: success ratio.
* Computes t-based confidence intervals.

**Typical output.** One figure (PNG/PDF) in `plots_combined/` with multiple curves (models × datasets).

### plot_xp2.py
**(Experiment 4.3, Figure.3)**

**Purpose.** Produces a multi-panel (1×3) figure, one panel per dataset, showing regurgitation **rate** as a function of training set size (T).

**What it does.**

* Reads per-user CSV summaries.
* Extracts T from `config`.
* Aggregates across users for each `(dataset, model, T)`.
* Computes t-based confidence intervals on the rate.

**Typical output.** One 1×3 figure (PNG/PDF) in `plots/`, with independent x-axes for each dataset.

### plot_xp3.py
**(Experiment 4.4, Figure.4 and Figure.6)**

**Purpose.** Generates a main single-column 1×3 figure (one panel per dataset) **plus** a set of appendix figures, analyzing regurgitation rate as a function of the number of unique watermarks (U).

**What it does.**

* Reads a CSV containing per-user `mean_second_match` values (optionally with chunk-level aggregation already applied upstream).
* Parses `T`, `U`, and `P` from `config` (e.g., `T5000_U50_P40`).
* Optionally filters to a specific `--T` and/or `--P` (auto-detected if unique).
* **Main figure:** aggregates across users for each `(dataset, model, U)` and adds t-based confidence intervals across users.
* **Appendix figures:** for each `(dataset, model)`, plots the per-user curves directly (no CI), using distinct styling per user.

**Typical output.**

* Main figure (PNG/PDF) in `plots/`.
* Appendix figures (PNG/PDF) in `plots/appendix/`.

### Scripts

* `plot_xp3.py`
* `plot_xp2.py`
* `plot_xp1.py`

### Purpose

This script aggregates the per-user CSV outputs produced by the first script and generates:

* aggregated statistics across users for each `(dataset, model, T)`
* confidence intervals (t-based)
* publication-ready figures in single-column layout

The resulting plots show how the regurgitation (or match) rate evolves with training set size.

### Input

One or more CSV files matching a glob pattern, typically generated by Script 1:

```
GLOBAL_user_seed_second_match_summary_*.csv
```

Each CSV must contain the following columns:

* `dataset`
* `model`
* `config`
* `user_id`
* `n_seeds`
* `mean_second_match`
* `var_second_match`

### Output

1. Aggregated CSV:

```
agg_by_dataset_model_T_user_CI_rate.csv
```

2. Figures (PNG and PDF):

```
plots/
  singlecol_1x3_indepx_CI95_rate.png
  singlecol_1x3_indepx_CI95_rate.pdf
```

### Usage

```bash
python plot_xp1.py \
  --input_dir . \
  --pattern "GLOBAL_user_seed_second_match_summary_1_*.csv" \
  --out_dir analysis_out \
  --ci 0.95

# Same CLI for XP2
python plot_xp2.py \
  --input_dir . \
  --pattern "GLOBAL_user_seed_second_match_summary_1_*.csv" \
  --out_dir analysis_out \
  --ci 0.95

# XP3
python plot_xp3.py \
  --input_csv GLOBAL_user_probeSeedInJson_chunkLevel_second_match_summary.csv \
  --T 5000 --P 40 \
  --out_dir analysis_out
```

### Key options

| Argument         | Description                             | Default                                         |
| ---------------- | --------------------------------------- | ----------------------------------------------- |
| `--input_dir`    | Directory containing per-user CSV files | `.`                                             |
| `--pattern`      | Glob pattern to match input CSVs        | `GLOBAL_user_seed_second_match_summary_1_*.csv` |
| `--out_dir`      | Output directory for CSVs and figures   | `analysis_out`                                  |
| `--min_users`    | Minimum users required to keep a point  | `2`                                             |
| `--ci`           | Confidence level for intervals          | `0.95`                                          |
| `--fig_width_in` | Figure width in inches (single-column)  | `3.25`                                          |
| `--dpi`          | Output DPI for PNG                      | `300`                                           |

---

Typical workflow

```bash
# Step 1: summarize per-user stability across seeds
python summary_seed_stability.py --root /path/to/experiments

# Step 2: aggregate across users and generate figures
python plot_xp1.py --input_dir .
```

---

## Notes

* If `scipy` is unavailable, a normal-approximation is used for confidence intervals.

---

## Requirements

* Python ≥ 3.8
* numpy
* pandas
* matplotlib
* scipy (optional, recommended)
