Delta-MIA: Measuring Membership Inference Attacks in Large Language Models via self-Contrast Framework

Xuewei Yang; Baiyu Huang; Junjie Wang; Shaoning Sun; Jiachen Yu; Yujiu Yang

Delta-MIA: Measuring Membership Inference Attacks in Large Language Models via self-Contrast Framework

Xuewei Yang, Baiyu Huang, Junjie Wang, Shaoning Sun, Jiachen Yu, Yujiu Yang

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Membership Inference Attacks, Self Contrast

TL;DR: Delta-MIA is an interventional self-contrast framework comparing pre- and post-exposure model states, isolates genuine membership signals, enabling unbiased, fine-grained and transferable evaluation.

Abstract: Membership inference attack (MIA) underpins privacy risk assessment, provenance, and compliance for large language models (LLMs). Observational evaluations confound membership with distribution shift, hide sample-level behavior, and assume access to proprietary corpora. We present Delta-MIA, an interventional self contrast framework that isolates genuine membership signals by comparing a model before and after controlled exposure to the same dataset. The pipeline records pre exposure responses on verifiably unseen data, performs full-parameter fine tuning on that data followed by stabilization, and computes sample level deltas. We introduce three diagnostics: explained variance ratio (EVR), mean vertical distance (MVD), and above diagonal ratio (ADR), which quantify noise, separation, and baseline detectability. Re-evaluating $9$ MIA methods, several remain robust once shift is removed, while others such as DC-PDD and Con-ReCaLL decline markedly; Min K\%++ shows strong separation with high MVD. Delta-MIA enables bias-free, interpretable, and transferable evaluation for MIA in LLMs.

Primary Area: datasets and benchmarks

Submission Number: 6980

Loading