Delta-MIA: Measuring Membership Inference Attacks in Large Language Models via self-Contrast Framework
Keywords: Large Language Models, Membership Inference Attacks, Self Contrast
TL;DR: Delta-MIA is an interventional self-contrast framework comparing pre- and post-exposure model states, isolates genuine membership signals, enabling unbiased, fine-grained and transferable evaluation.
Abstract: Membership inference attack (MIA) underpins privacy risk assessment, provenance, and compliance for large language models (LLMs).
Observational evaluations confound membership with distribution shift, hide sample-level behavior, and assume access to proprietary corpora.
We present Delta-MIA, an interventional self contrast framework that isolates genuine membership signals by comparing a model before and after controlled exposure to the same dataset.
The pipeline records pre exposure responses on verifiably unseen data, performs full-parameter fine tuning on that data followed by stabilization, and computes sample level deltas.
We introduce three diagnostics: explained variance ratio (EVR), mean vertical distance (MVD), and above diagonal ratio (ADR), which quantify noise, separation, and baseline detectability.
Re-evaluating $9$ MIA methods, several remain robust once shift is removed, while others such as DC-PDD and Con-ReCaLL decline markedly;
Min K\%++ shows strong separation with high MVD.
Delta-MIA enables bias-free, interpretable, and transferable evaluation for MIA in LLMs.
Primary Area: datasets and benchmarks
Submission Number: 6980
Loading