Training Data Extraction Attack from Large Language Models in Federated Learning Through Frequent Sequence Mining

Training Data Extraction Attack from Large Language Models in Federated Learning Through Frequent Sequence Mining

ACL ARR 2024 June Submission5799 Authors

16 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) are vulnerable to data extraction attacks due to their tendency to memorize precise training data. In contrast, Federated Learning (FL) has the potential to mitigate privacy leakage. This underscores the need for an assessment of the privacy risks associated with LLMs trained with FL algorithms, which remains an underexplored question. In this study, we evaluate the privacy leakage of LLMs trained with FL algorithms on the public datasets extended with automatically annotated Personally Identifiable Information (PII) to evaluate the leakage of PII and training example outputs. Through extensive experiments, we find out that FL algorithms indeed mitigate privacy leaks compared to their counterparts on centralized data. In addition, we discover a novel data extraction attack method, called cross-client security theft, which can recover up to 40\% of unique PII mentions in target devices by accessing only one of the FL participants. These findings highlight the potential privacy risks of FL for LLMs and underscore the need to explore new protective mechanisms in future research.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: FL, NLP, LLM

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: Chinese

Submission Number: 5799

Loading