Keywords: llm, transsuasion, persuasion, PersuasionArena, PersuasionBench, advertising, marketing, behavioral science, behavior in the wild
Abstract: Large Language Models (LLMs) are increasingly being used in workflows involving generating content to be consumed by humans (*e.g.,* marketing) and also in directly interacting with humans (*e.g.,* through chatbots). The development of such systems that are capable of generating verifiably persuasive messages presents both opportunities and challenges for society. On the one hand, such systems could positively impact domains like advertising and social good, such as addressing drug addiction, and on the other, they could be misused for spreading misinformation and shaping political opinions. To channel LLMs' impact on society, we need to develop systems to measure and benchmark their persuasiveness. With this motivation, we introduce **PersuasionBench** and **PersuasionArena**, the first large-scale benchmark and arena containing a battery of tasks to automatically measure the simulative and generative persuasion abilities of large language models. We introduce **transsuasion** (trans = carrying across, suasion = the act of persuading), a novel task of transforming non-persuasive language into persuasive content while preserving other factors determining persuasiveness (sender, receiver, time, and channel). Our findings indicate that the simulative persuasion capabilities of LLMs are barely above random; however, their generative persuasion capabilities are much better. For instance, GPT-4o loses only 36% of the time when playing against the best human persuader. Further, we find that LLMs' persuasiveness correlates positively with model size, but smaller models can also be made to have a higher persuasiveness than much larger models. Notably, targeted training using synthetic and natural datasets significantly enhances smaller models' persuasive capabilities, challenging scale-dependent assumptions. Our findings carry key implications for both model developers and policymakers. For instance, while the EU AI Act and California's SB-1047 aim to regulate AI models based on the number of floating point operations, we demonstrate that simple metrics like this alone fail to capture the full scope of AI's societal impact. We invite the community to explore and contribute to PersuasionArena and PersuasionBench, available at [behavior-in-the-wild.github.io/measure-persuasion](https://behavior-in-the-wild.github.io/measure-persuasion), to advance our understanding of AI-driven persuasion and its societal implications.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9756
Loading