Securing Author Privacy using Large Language Models

ACL ARR 2024 June Submission3990 Authors

16 Jun 2024 (modified: 22 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Sophisticated machine learning models can determine the author of a given document using stylometric features or contextualized word embeddings. In response, researchers have developed Authorship Obfuscation methods to disguise these identifying characteristics. Despite the growing popularity of large language models like GPT-4, their utility for this purpose has not been previously studied. In this work, we explore the application of popular large language models to the task of author obfuscation, and show that they can outperform a state-of-the-art approach. We analyze their behavior and suggest a personalized prompting technique for improving performance on more difficult authors. Our code and experiments will be made publicly available.
Paper Type: Short
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Adversarial attacks, feature attribution, prompting, security and privacy
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 3990
Loading