# Black-box and Adaptive Dataset Inference for Large Language Models

This repository contains the reference implementation of the BADI methodology (Black-box and Aadaptive Dataset Inference).

## ABSTACT

Large language models (LLMs) are trained on massive, largely uncurated internet-scale datasets that may contain copyrighted content. This raises a critical question whether a given dataset contributed to a LLM's training. Dataset inference (DI) tackles this challenge by extracting membership signals from a suspect set and aggregating the extracted signals with a statistical test. However, the DI method faces two significant practical limitations. First, it assumes gray-box access to token-level probabilities, whereas most commercial LLM APIs expose only generated text. We address this problem by estimating per-token probabilities from label-only outputs, enabling DI in a fully black-box setting. Second, the DI pipeline relies on fixed hypothesis testing via p-values, which requires committing in advance to both a suspect set and a significance level. This is costly for large suspect sets and often inconclusive for small sets, while adding samples adaptively breaks the validity guarantees of p-values. To overcome this limitation, we introduce DI framework based on e-values and sequential testing, where e-values provide anytime-valid guarantees and support optional continuation. This allows evidence to be accumulated iteratively as more suspect samples are queried. Through these two fundamental advances of token probability approximation from label-only outputs and e-value–based sequential testing, we make DI practical for real-world LLM auditing under black-box access with adaptive evidence accumulation.

## Key Files (in order)

1. `./main/raw_values.py`
   - Purpose: generate raw similarity per token from the dataset and `log_probability` (used only for reference models and plots).
   - Key function: `raw_values_batch`
     - Computes token-level similarity (per token) in batches.
     - Also collects per-token log-probabilities.
     - Outputs raw, per-token features for downstream use.

2. `./main/di.py`
   - After similarity is generated, this file builds model features ("mfeatures").
   - Consumes per-token similarities and prepares features for metric computation.

3. `./main/metrics.py`
   - Generates metrics.
   - Important functions:
     - `get_losses_from_dict`: estimates loss from token probabilities.
     - `aggregate_metrics`: given a list of estimated losses, computes all metrics (based on per-token similarity).

## BERT-score-based workflow

- `./core/get_predic.py`: generates model responses for the original prefix and for the perturbed prefix.
- `./main/baseline.py`: computes BERT-based metrics.

## Scoring by betting (e-value)

- `./main/linear_di_e_vals.py`: runs a sequential ablation framework that uses online Kernel MMD hypothesis testing on precomputed language model metrics to detect distributional differences across data splits, applying normalization and outlier handling, training simple classifiers, and outputting detailed CSV reports with statistical metrics and performance traces.
- `./main/aggregate_evalue_results.py`: aggregates sequential testing results by summarizing performance metrics and generating wealth trajectory plots (with confidence intervals and significance thresholds) for each dataset, saving outputs per dataset along with a shared legend at the root.

## Metric Types

- Token-level black-box features: computed via `./main/metrics.py`, `./main/di.py`, and `./main/raw_values.py` (per-token similarity based).
- Sequence-level black-box features: computed via `./core/get_predic.py` and `./main/baseline.py` (BERT-based).

# Pipeline

First create python environment:

```bash
python3 -m venv .badi  # Create a local virtual environment.
source .badi/bin/activate  # Activate it (repeat in any new shell before running steps below).
pip install -r requirements.txt  # Install project dependencies.
python -m spacy download en_core_web_sm # Download and install the small English core model for spaCy
```

Process the dataset:

```bash
python3 ./core/download_dataset.py ./data/data # first download the dataset

python3 ./core/split_dataset.py --input-dir ./data/data --output-dir ./data/full_split_data \
   --tokenizer "EleutherAI/pythia-410m-deduped" --min-tokens 164 --max-val-train

python3 ./core/split_text_prefix_suffix.py \
    --input-dir ./data/full_split_data \
    --output-dir ./data/full_prefix_suffix_data \
    --model-name "EleutherAI/pythia-410m-deduped" \
    --suff-tokens 64 \
    --max-val-train

```

Process the dataset with perturbations:

Note that torch in version 2.3 is needed, otherwise torchtext won't be compatible with pytorch. After running a script split_text_prefix_trans_suffix.py, the correct version of pytorch from requirements.txt have to be reinstalled.

```bash
python3 ./core/split_text_prefix_trans_suffix.py \
    --input-dir ./data/full_split_data/ \
    --output-dir ./data/full_prefix_trans_suffix_data \
    --model-name "EleutherAI/pythia-410m-deduped" \
    --suff-tokens 64 \
    --max-val-train
```

For sequence-level metrics:

```bash
# generate the model predictions
python3 ../core/get_predic.py \
   --data_dir ./data/full_prefix_trans_suffix_data \
   --n_suff 64 \
   --model_name "EleutherAI/pythia-410m-deduped" \
   --batch_size 18 \
   --max-val-train \
   --use-transformations

python3 ../core/get_predic.py \
   --data_dir ./data/full_prefix_trans_suffix_data \
   --n_suff 64 \
   --model_name "EleutherAI/pythia-410m-deduped" \
   --batch_size 18 \
   --max-val-train

# run bert scores for transformations
python3 ../main/baseline.py \
   --model_name EleutherAI/pythia-410m-deduped \
   --data_dir ./data/full_prefix_trans_suffix_data/processed_predic/ \
   --result_output ./data/results/full_baseline \
   --n_suff 64 \
   --metrics_folder ./data/metrics/bert \
   --max-val-train \
   --between-predictions-bert-score

# run bert score for standart test
python3 ../main/baseline.py \
    --model_name EleutherAI/pythia-410m-deduped \
    --data_dir ./data/full_prefix_suffix_data/processed_predic/ \
    --result_output ./data/results/full_baseline \
    --n_suff $N_SUFF \
    --metrics_folder ./data/metrics/bert \
    --max-val-train

```

For token-level metrics:

```bash
# 1. step - generate raw_values
python3 ./main/batch_raw_values.py \
        --data_dir ./data/full_split_data \
        --model_name EleutherAI/pythia-410m-deduped \
        --result_output ./data/raw_values \
        --cache_dir ~/.cache \
        --max-val-train

# 2. step - generate metrics
python3 ./main/batch_di.py \
        --data_dir ./data \
        --model_name EleutherAI/pythia-410m-deduped \
        --result_output ./data/metrics/final_metrics/metrics_sigmoid \
        --cache_dir ~/.cache \
        --max-val-train \
        --loss_estimation_method "sigmoid" \
        --reference_model_names "" \

# 2.1 merge token-level metrics with sequence-level metrics
python3 "./utils/merge_metrics.py" ./data/metrics/bert ./data/metrics/final_metrics/metrics_sigmoid EleutherAI/pythia-410m-deduped

# 3. step - for e-values
python3 ./main/linear_di_e_vals.py \
  --dataset_name wikipedia \
  --model_name EleutherAI/pythia-410m-deduped \
  --lambda_max 0.8 \
  --num_trials 1000 \
  --online_epochs 30 \
  --metrics_path ./data/metrics/final_metrics/metrics_sigmoid/EleutherAI_pythia-410m-deduped \
  --output_dir ./data/results/final_results/e_values

python ./main/aggregate_evalue_results.py \
  --root ./data/results/final_results/e_values \
  --alpha1 0.05 \
  --alpha2 0.01

# 3. step - for p-values
python3 ./main/batch_linear_di.py \
          --results_dir ./data/metrics/final_metrics/metrics_sigmoid \
          --output_dir ./data/results/final_results/stats_sigmoid \
          --model_name EleutherAI/pythia-410m-deduped \
          --num_random 4 \
          --percent_to_train 0.5 \
          --outliers "mean" \
          --no-test

python3 ./utils/plot_linear_di_heatmap.py \
          --base_dir ./data/results/final_results/stats_sigmoid \
          --pvalues_subdir "p_values/mean-outliers/train-normalize-selected_features/EleutherAI_pythia-410m-deduped" \
          --output_name "linear_di_heatmap.pdf" \
          --number_to_combine 4 \
          --title "heatmap"
```

This code is inspired by the work from this repository: [pratyushmaini/llm_dataset_inference](https://github.com/pratyushmaini/llm_dataset_inference). Many thanks to pratyushmaini for the valuable contribution and inspiration.
