# Curation Leaks: Membership Inference Attacks Against Data Curation for Machine Learning

This repository contains the code for the ICLR 2026 submission **"Curation Leaks: Membership Inference Attacks Against Data Curation for Machine Learning"**.

## Overview

This work introduces novel membership inference attacks that exploit data curation pipelines used in large-scale machine learning. We demonstrate that even when models are trained exclusively on curated public data, they can leak membership information about the private target data used to guide curation.

## Attack Methods

The repository implements attacks targeting three stages of the curation pipeline:

### 1. Score-Based Attacks
**Files:** `filtered_lira.py`, `imagebased_scores.py`

Attacks against curation scores released by the curation method:

- **LiRA (Likelihood Ratio Attack)**: Adapted for curation scores using shadow curation runs
  - `lira_1d()`: 1-dimensional LiRA using univariate Gaussian distributions
  - `lira_attack_nd()`: Multi-dimensional LiRA using multivariate Gaussian distributions

- **Image-Based Voting Attack**: Exploits nearest-neighbor structure in image embedding-based curation
  - `compute_votes()`: Computes membership votes based on similarity patterns
  - Uses correspondence between pool samples and target embeddings

**Key Functions:**
- `give_measurements_fast()`: Identifies high-signal measurements for each target
- `stable_logpmf()`: Numerically stable probability computation

### 2. Subset Selection Attacks
**Files:** `imagebased_noscores.py`

Attacks when only the curated subset is observable:

- **Binary LiRA**: Adapts LiRA to binary selection observations
- **Sigmoid Binary LiRA**: Uses soft binarization to preserve shadow set information
- **Iterative Reconstruction**: For image-based curation, iteratively refines target set hypothesis
  - `imagebased_attack_one_target()`: Iterative elimination process
  - Uses Jaccard similarity to match victim curation patterns

**Key Functions:**
- `compute_nearest_neighbor_similarities()`: Cosine similarity computation
- `compute_nearest_neighbor_idxs()`: Correspondence mapping between pool and targets

### 3. End-to-End Model Attacks
**Files:** `imge2e/correspondence_attack.py`, `trake2e/trak_attack.py`

Attacks against models trained on curated data, requiring injection of fingerprinted samples:

#### Correspondence Attack (Image-Based)
Uses fingerprinted samples with modified captions to create detectable signals:

**Algorithm Steps:**
1. **Fingerprint Construction**: Create pool samples with high correspondence to specific targets
2. **Correspondence Discovery**: Map fingerprints to targets using similarity scoring (Eq. 10 in paper)
3. **Selection Monitoring**: Detect which fingerprints were selected
4. **Membership Inference**: Compute surprise scores based on selection patterns

**Key Functions:**
- `find_optimal_correspondences()`: Balances attraction to target vs. repulsion from others
- `compute_baseline_rankings()`: Expected percentile ranks without target
- `select_marked_samples()`: Strategic fingerprint selection
- `compute_mia_scores()`: Selection surprise analysis (Algorithm 5)

#### TRAK Attack
Exploits gradient-based curation with augmented captions:

**Algorithm Steps:**
1. **Candidate Creation**: Augment target samples with orthogonal semantic information
2. **Signal Analysis**: Compute influence matrix S = CG⁻¹λY⊤ (Eq. 13)
3. **SNR Optimization**: Select fingerprints maximizing |Sᵢⱼ|/νᵢ
4. **Hypothesis Testing**: Compare scores under H₀ (target absent) vs H₁ (target present)
5. **Threshold Crossing Detection**: Identify fingerprints crossing selection threshold

**Key Functions:**
- `compute_xtx_inverse()`: Regularized gradient covariance G⁻¹λ
- `compute_contrastive_scores()`: Sherman-Morrison formula for score updates
- `find_optimal_targets()`: SNR-based fingerprint selection
- `compute_membership_signal()`: Confidence-weighted membership scores (Algorithm 1)
