Team of Rivals: Hierarchical Deep Reinforcement Learning and Behavior Cloning for Multiplayer Poker

Published: 19 Dec 2025, Last Modified: 05 Jan 2026AAMAS 2026 FullEveryoneRevisionsBibTeXCC BY 4.0
Keywords: deep reinforcement learning, behavioral cloning, poker
Abstract: Multiplayer no‑limit Texas Hold’em is considered a challenging benchmark for AI algorithms, due to the need to for decision making under partial information, strategic deception, and non‑stationary opponents. Classical equilibrium‑based techniques do not extend cleanly to the multiplayer setting, and prevailing multiplayer solutions, such as LLMs, tend to be computationally intensive. This study introduces Havoc, a hierarchical deep RL approach that combines behavior cloning of individual human experts with a value‑based master policy that selects, at each decision point, which specialist to deploy. By preserving distinct human play styles in the specialist policies and learning when to deploy them, Havoc adapts its strategy rapidly as table conditions shift. Despite limited training data, Havoc attains strong multiplayer performance, outperforming current state-of-the-art methods while also requiring substantially less computational resources.
Area: Engineering and Analysis of Multiagent Systems (EMAS)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 1389
Loading