Can AI-Generated Persuasion Be Detected? Persuaficial Benchmark and AI vs. Human Linguistic Differences

Can AI-Generated Persuasion Be Detected? Persuaficial Benchmark and AI vs. Human Linguistic Differences

ACL ARR 2026 January Submission3547 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Persuasion, Multilingual Benchmark, Linguistic Analysis, AI-generated text

Abstract: Large Language Models (LLMs) can generate highly persuasive text, raising concerns about their misuse for propaganda, manipulation, and other harmful purposes. This leads us to our central question: $\textit{Is LLM-generated persuasion more difficult to automatically detect than human-written persuasion?}$ To address this, we categorize controllable generation approaches for producing persuasive content with LLMs and introduce Persuaficial, a high-quality multilingual benchmark covering six languages: English, German, Polish, Italian, French and Russian. Using this benchmark, we conduct extensive empirical evaluations comparing human-authored and LLM-generated persuasive texts. We find that although overtly persuasive LLM-generated texts can be easier to detect than human-written ones, subtle LLM-generated persuasion consistently degrades automatic detection performance. Beyond detection performance, we provide the first comprehensive linguistic analysis contrasting human and LLM-generated persuasive texts, offering insights that may guide the development of more interpretable and robust detection tools.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: NLP datasets, benchmarking, language resources, multilingual corpora

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources

Languages Studied: English, German, Polish, Italian, French, Russian

Submission Number: 3547

Loading