Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of Large Language Models

Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of Large Language Models

ACL ARR 2025 July Submission88 Authors

22 Jul 2025 (modified: 29 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Federated fine-tuning of large language models (FedLLMs) presents a promising approach for achieving strong model performance while preserving data privacy in sensitive domains. However, the inherent memorization ability of LLMs makes them vulnerable to training data extraction attacks. To investigate this risk, we introduce simple yet effective extraction attack algorithms specifically designed for FedLLMs. In contrast to the “verbatim” extraction attacks, which assume access to fragments from all training data, our approach operates under a more realistic threat model, where the attacker only has access to a single client’s data and aims to extract previously unseen personally identifiable information (PII) from other clients. This requires leveraging contextual prefixes held by the attacker to generalize across clients. To evaluate the effectiveness of our approaches, we propose two rigorous metrics—coverage rate and efficiency—and extend a real-world legal dataset with PII annotations aligned with CPIS, GDPR, and CCPA standards, achieving 89.9\% human-verified precision. Experimental results show that our method can extract up to 56.57\% of victim-exclusive PII, with "Address," "Birthday," and "Name" being the most vulnerable categories. Our findings underscore the pressing need for robust defense strategies and contribute a new benchmark and evaluation framework for future research in privacy-preserving federated learning. The data and code will be made publicly available to facilitate reproducibility.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: Data Extraction Attack, PII Extraction, Federated Large Language Models

Contribution Types: Model analysis & interpretability

Languages Studied: Chinese

Submission Number: 88

Loading