BrandEval: A Two-Track Multi-Agent Benchmark for Risk-Sensitive LLM Crisis Communication

BrandEval: A Two-Track Multi-Agent Benchmark for Risk-Sensitive LLM Crisis Communication

ACL ARR 2026 January Submission5980 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: crisis communication, social simulation

Abstract: While large language models have improved on procedurally verifiable tasks, their behavior in high-stakes institutional crises remains under-evaluated because success depends on evolving stakeholder perceptions rather than a single verifiable answer. Existing benchmarks focus on static, single-turn competence and provide limited coverage of risk-sensitive, goal-directed communication under interaction. We introduce BrandEval, a two-track benchmark that pairs a rubric-based static diagnostic with BrandPolis, a dynamic multi-agent sandbox of competitive, partially observable markets. We also introduce Strategic Rationale, a lightweight decision workflow, and BrandSRD, a Chinese dataset of crisis-response decision points with human-validated preferences. Using BrandSRD, we build a reference SR-based Strategic Agent and study how communication styles affect long-horizon trust and tail risk. These resources enable controlled stress testing of LLM crisis communication, exposing failure modes and societal risks that single-turn evaluation may miss. BrandSRD, BrandEval, and BrandPolis will be released publicly.

Paper Type: Long

Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good

Research Area Keywords: benchmarking, evaluation methodologies

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: Chinese, English

Submission Number: 5980

Loading