Keywords: Political Consensus Finding, Parliament Deliberation, LLM Benchmark
TL;DR: We introduce EuroCon, a benchmark constructed from 2,225 European Parliament's real deliberation records, which can evaluate the ability of LLMs to find political consensus within various parliament settings.
Abstract: Achieving political consensus is crucial yet challenging for the effective functioning of social governance.
However, although frontier AI systems represented by large language models (LLMs) have developed rapidly in recent years, their capabilities on this scope are still understudied. In this paper, we introduce EuroCon, a novel benchmark constructed from 2,225 high-quality deliberation records of the European Parliament over 13 years, ranging from 2009 to 2022, to evaluate the ability of LLMs to reach political consensus among divergent party positions across diverse parliament settings. Specifically, EuroCon incorporates four factors to build each simulated parliament setting: specific political issues, political goals, participating parties, and power structures based on seat distribution.
We also develop an evaluation framework for EuroCon to simulate real voting outcomes in different parliament settings, assessing whether LLM-generated resolutions meet predefined political goals. Our experimental results demonstrate that even state-of-the-art models remain under satisfied with complex tasks, highlighting EuroCon's promise as an effective platform for studying LLMs' ability to find political consensus.
Submission Number: 3
Loading