Securing Author Privacy using Large Language Models

Securing Author Privacy using Large Language Models

ACL ARR 2024 June Submission3990 Authors

16 Jun 2024 (modified: 22 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Sophisticated machine learning models can determine the author of a given document using stylometric features or contextualized word embeddings. In response, researchers have developed Authorship Obfuscation methods to disguise these identifying characteristics. Despite the growing popularity of large language models like GPT-4, their utility for this purpose has not been previously studied. In this work, we explore the application of popular large language models to the task of author obfuscation, and show that they can outperform a state-of-the-art approach. We analyze their behavior and suggest a personalized prompting technique for improving performance on more difficult authors. Our code and experiments will be made publicly available.

Paper Type: Short

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Adversarial attacks, feature attribution, prompting, security and privacy

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 3990

Loading