Oversight is Not Compliance: Tacit Collusion in LLM Pricing Agents Under Antitrust Regulation

Published: 23 May 2026, Last Modified: 23 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM agents, agent oversight, alignment, monitorability, chain-of-thought faithfulness, multi-agent coordination, algorithmic collusion, governance, interpretability, evaluation
TL;DR: We apply antitrust-style oversight to an LLM pricing duopoly and examine impact on collusive pricing behavior and how the agents reason about oversight
Abstract: Frontier LLM pricing agents drift toward supracompetitive prices on their own \citep{fish2024}; how they respond to in-loop oversight, in price and in reasoning, has not been systematically tested. We run a repeated pricing duopoly under three intervention modes (passive logging, one-shot revision, hard veto) on three frontier models. (i) The best intervention mode is model-specific and the rank order reverses across models: veto moves Claude Sonnet 4 to the competitive benchmark, revision moves DeepSeek-Chat V3 past it, and GPT-5.4 mini stays supracompetitive under every intervention ($\beta_{\text{veto}}=-0.89$, $\beta_{\text{revision}}=-0.46$ on the supracompetitive index). (ii) Cross-model heterogeneity is organized by what we call the agent's \emph{theory of the rule}, read off the regularity in how its notes treat the regulator's intervention: DeepSeek-Chat V3 treats the cap as a target to overshoot, GPT-5.4 mini as a wall to park just under, Claude Sonnet 4 as its own market discovery. (iii) The channel decomposition (private notes, public justification, structured self-disclosure flag) separates \emph{cosmetic compliance} (the public channel hides strategic content present in the notes) from \emph{reasoning-channel monitorability failure} (the notes themselves do not transparently reveal the strategy). Across models the structured flag is the cleanest of the three channels, but no single channel reliably reveals which theory the agent has formed, because each is written for the agent's own purposes rather than for an auditor. The same audit problem arises for chain-of-thought monitorability of reasoning-model agents.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 294
Loading