Keywords: Equilibrium selection, Strategic reasoning with LLMs, Mechanism design, Algorithmic collusion, Learning dynamics in strategic settings
TL;DR: We use reinforcement learning to train a meta-controller that determines when and how to nudge LLM agents toward welfare-improving equilibria — without game-specific engineering.
Abstract: LLM agents in repeated strategic interactions face an equilibrium selection problem: unassisted populations often coordinate on low-welfare equilibria, while rule-based mediators require game-specific calibration. We propose a learned meta-controller serving as an empirical equilibrium selector, trained via reinforcement learning from welfare feedback alone — no game-specific reward shaping or access to player internals. We instantiate two variants matched to credit-assignment structure: PPO for multi-round social dilemmas, and a contextual bandit ($\gamma=0$) for dense per-round reward settings. Evaluated on classical matrix games, PPO significantly outperforms no-intervention and always-intervene baselines while matching mid-tier hand-crafted mediators without game-specific tuning. Most strikingly, both variants exhibit emergent selectivity — active in coordination-challenged games, passive where LLMs already self-coordinate — confirming this behavior arises from the welfare reward design rather than the choice of algorithm. Bertrand price competition further validates the credit-assignment principle: bandit underperforms PPO where multi-round credit matters (social dilemmas) but matches rule-based baselines where per-round reward suffices (Bertrand), correctly learning passivity when firms self-regulate. A key finding is that prompt match matters — a controller beats the best rule-based baseline in its best-matched condition but generalizes poorly across prompts, motivating prompt-conditioned training.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Paper Type: Standard paper
Submission Number: 24
Loading