I Am No One: Style-Aware Paraphrasing for Text Anonymization

ACL ARR 2025 May Submission5977 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Online content, despite being posted under pseudonyms, presents significant privacy risks as it often contains subtle stylistic cues that can be exploited to identify authors. Various studies have highlighted the importance of adding noise to textual data for anonymization, particularly through differential privacy; however, such methods often degrade the quality and utility of the original text. In this work, we propose an alternative approach to text anonymization that leverages the ability of pretrained large language models to capture and modify subtle stylistic attributes present in user generated text. Our method constructs an author’s stylistic profile from minimal text samples and rewrites it using targeted paraphrasing to obscure identifiable style markers while preserving the original content. This strategic style manipulation allows us to significantly reduce the effectiveness of Authorship attribution attacks. On a real-world Google review dataset, our approach achieves a 50% reduction in authorship attribution success rates while maintaining content quality. We conduct extensive experiments across multiple datasets and rigorously evaluate our approach to assess its effectiveness in balancing the privacy-utility trade off.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Authorship attribution. Text Anonymization, Style Profiling
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 5977
Loading