AREG: Adversarial Resource Extraction Game for Evaluating Persuasion and Resistance in Large Language Models

Published: 14 Jun 2026, Last Modified: 17 Jun 2026ICML 2026 Workshop MusIML PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, persuasion, resistance to persuasion, adversarial dialogue, social influence, interactive evaluation, negotiation game, llm safety, dialogue systems, social engineering, behavioral evaluation, multi-turn interaction
TL;DR: AREG benchmarks persuasion and resistance in LLMs via adversarial financial negotiation. Across 280 games, offensive and defensive abilities were weakly correlated, with incremental persuasion and verification-based defense proving most effective.
Abstract: Evaluating LLM social intelligence requires moving beyond static text toward dynamic interactions. We introduce the Adversarial Resource Extraction Game (AREG), a benchmark operationalizing persuasion and resistance as a multi-turn, zero-sum financial negotiation. A tournament across frontier models reveals that offensive and defensive capabilities are empirically dissociated and weakly correlated ($\rho = 0.33$). While models show a systematic defensive advantage, effectiveness depends heavily on dialogue structure: incremental persuasion outperforms single asks, and verification-seeking defends better than explicit refusal. These findings demonstrate that social influence is not a monolithic capability, highlighting the need for dual-sided evaluation to uncover asymmetric behavioral vulnerabilities.
Track: Track 2: ML Research by Muslim Authors
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Non Archival Confirmation: I understand that submissions to MusIML are non-archival and can be submitted to other venues.
Submission Number: 17
Loading