Rigging the Foundation: Manipulating Pre-training for Advanced Membership Inference Attacks

Published: 01 Jan 2025, Last Modified: 15 Oct 2025SP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The significant advances in computing power have led to a surge in model complexity. Training such models today increasingly relies on transfer learning, where models are pre-trained on large datasets and later fine-tuned for different domains, allowing the knowledge in the pre-trained model to be effectively reused and customized for these specific domains. However, such a learning paradigm also opens new attack surfaces on the fine-tuned model. Particularly, a privacy risk never studied before is the threat posed by the adversary affecting the pre-training process to the downstream user's private data for fine-tuning the model: A manipulated pre-trained model can render its fine-tuned version vulnerable to privacy attacks, such as membership inference attacks (MIAs) where the presence of a given sample in the fine-tuning dataset can be determined by querying the vulnerable model. A unique challenge in understanding this privacy risk is how to amplify the membership leakage while ensuring the performance of the fine-tuned model. To address this challenge, we introduce a new technique - active robustness overfitting (ARO). This approach actively induces robustness overfitting during pre-training, which amplifies membership leakage in the downstream task without affecting its accuracy, while also maintaining the stealthiness of the attack. Our extensive evaluations across various datasets and diverse MIA scenarios demonstrate that our methods can effectively amplify membership leakage while preserving satisfactory downstream test accuracy, which contributes to a better understanding of the privacy risk introduced by transfer learning.
Loading