House Rules: Institutional Design in Multi-Agent LLM Code Markets
Keywords: multi-agent systems, LLM agents, agent evaluation, mechanism design, institutional design, market design, AI safety, reward hacking, coordination, collusion, code marketplaces, behavioral measurement, testbed
TL;DR: An open-source multi-agent LLM testbed showing that institutional rules (scoring, reviews, settlement, identity) shape agent behavior more than prompts
Abstract: LLM agents are moving from research demos into economic systems where many of them transact, compete, and communicate under explicit institutional rules. Their behavior in these settings depends not only on prompts but on how they are scored, whether trades are reviewable, who pays for purchases, and which peer identities are visible. Yet most agent evaluations isolate single agents on cooperative tasks, leaving this institutional dimension underexplored.
We introduce Game of Agents, an open-source testbed for studying how the rules of a multi-agent economy shape LLM-agent behavior. The environment combines a skill-rated poker tournament, a code marketplace, and a public chat channel, allowing scoring rules, reviews, settlement, identity exposure, and population composition to vary while the underlying task remains fixed.
Across a 39-run release corpus and auxiliary matched controls, institutional design drives the largest behavioral shifts. A placement-based scoring rule induces extreme laddering at ~25× chance (19 of 291 agent-runs vs. ~0.76 expected when outcomes are shuffled within each game). Under the same cooperative prompt, removing public reviews reduces 3-hour purchases ~5× (12.2 to 2.5), while changing only settlement from net transfer to zero-cost-to-buyer additive settlement increases purchases ~2.6× (12.2 to 32.0). Visible model-family identifiers also create a target for one unanswered same-model coordination solicitation, and rating-equalization effects depend on population composition. All code, configs, and data are released to facilitate reproducibility.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 50
Loading