Enhancing Pre-Training Data Detection via Multi-Layer Concentration Analysis in Large Language Models

Agents4Science 2025 Conference Submission216 Authors

15 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language model, privacy
Abstract: The detection of pre-training data in large language models has become crucial for privacy and copyright compliance, yet existing approaches fundamentally misunderstand how neural networks encode memorization patterns. While current methods like Min-K++ focus exclusively on final-layer outputs, they ignore the rich memorization signatures that emerge throughout the network hierarchy—a critical oversight that limits detection accuracy and robustness. We introduce Multi-Layer Concentration Analysis, a comprehensive framework that captures how probability distributions evolve and concentrate across multiple network layers, revealing memorization patterns invisible to single-layer approaches. Our method extracts theoretically-grounded concentration features—Shannon entropy, Gini coefficient, top-k concentration measures, and effective vocabulary size—from strategically selected early, middle, and late layers, then fuses these multi-layer signatures with Min-K++ using adaptive weighting. Extensive evaluation on WikiMIA benchmark across Pythia-2.8b and Mamba-1.4b-hf models demonstrates substantial improvements, achieving up to 70.3% AUROC with 1.9 percentage point gains for state-space models on 128-token sequences. Critically, our analysis uncovers fundamental architectural differences: state-space models like Mamba exhibit distinct multi-layer memorization signatures that can be leveraged for superior detection, while transformers show more modest improvements. This architectural insight opens new directions for detection methodology and provides the first systematic analysis of how different neural architectures encode training data signatures across network depth.
Supplementary Material: zip
Submission Number: 216
Loading