A Minimal Decision Capacity Threshold Prevents Catastrophic Exploitation in Self-Play RL

Arahan Kujur

A Minimal Decision Capacity Threshold Prevents Catastrophic Exploitation in Self-Play RL

Arahan Kujur

Published: 07 Jun 2026, Last Modified: 09 Jun 2026ICML 2026 WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: self-play reinforcement learning, multi-agent reinforcement learning, decision capacity, asymmetric action-space perturbation, Q-learning, co-adaptation, catastrophic exploitation, Nash equilibrium, Kuhn poker, Leduc poker, robustness, structural instability, game theory

TL;DR: In self-play RL on Kuhn and Leduc Poker, removing all of one player's decision points triggers catastrophic exploitation, but preserving even a single decision point keeps learning near Nash equilibrium.

Abstract: We show that a minimal threshold in decision capacity determines whether self-play reinforcement learning agents collapse under asymmetric rule perturbations. In Kuhn and Leduc Poker, we remove Player 0's ability to bet or raise—either at all decision nodes or only at the opening move. Across five seeds with paired card deals: (i) removing the bet/raise action at all decision nodes (capacity 0 in Kuhn; residual capacity $>0$ in Leduc, where fold/check-call remain) causes adaptive Q-learning to collapse toward exploitation (Kuhn: $-0.93$; Leduc: $-0.31$), while a frozen Q-learning baseline stays near $-0.14$, confirming the collapse is co-adaptation-driven; (ii) preserving a single decision point stabilises Q-learning near Nash equilibrium (Kuhn: $-0.07$; Leduc: $-0.10$); (iii) the pattern is timing-invariant: early, mid, and late perturbation produce identical collapse severity; (iv) collapse is fast, occurring within four episodes on average in Kuhn. These results reveal a structural instability in learning dynamics: equilibrium behaviour becomes unsustainable once agents lose all contingent responses. We frame the collapse as a learning-dynamics instability measured against the original-game Nash value (Kuhn: $-1/18 \approx -0.056$; Leduc: $\approx -0.087$), not as a claim that the exploiting opponent behaves irrationally. We provide empirical evidence for a sharp threshold effect between zero and minimal decision capacity, observed consistently across two small imperfect-information poker games rather than being specific to Kuhn.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Paper Type: Standard paper

Submission Number: 10

Loading