Keywords: Large Language Models, Strategic Behavior, Information Design
Abstract: Large language models (LLMs) have demonstrated strong persuasive capabilities comparable to those of humans, offering promising benefits while raising societal concerns. However, systematically evaluating the persuasive capabilities of LLMs is inherently challenging, as the effectiveness of persuasion among humans varies significantly across different domains. In this paper, we take a theory-driven approach to provide a scalable and principled framework for studying the persuasive capabilities of LLMs. Grounded in Bayesian persuasion theory, we repurpose human-human persuasion datasets to construct environments for evaluating and training LLMs as strategic persuaders. Our results reveal that frontier models can consistently achieve high persuasion gains and exhibit sophisticated persuasion strategies that align with theoretical characterizations. Building on this, we use reinforcement learning to train LLMs for strategic persuasion in our environments. Our results also demonstrate that even small LLMs can obtain significantly higher persuasion gains through reinforcement learning.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 14687
Loading