FPDou: Mastering DouDizhu with Fictitious Play

ICLR 2026 Conference Submission13018 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: DouDizhu, deep reinforcement learning, fictitious play
TL;DR: A theoretically plausible and empirically effective algorithm for Doudizhu game.
Abstract: DouDizhu is a challenging three-player imperfect-information game involving competition and cooperation. Despite strong performance, existing methods are primarily developed with reinforcement learning (RL) without closely examining the stationary assumption. Specifically, DouDizhu's three-player nature entails algorithms to approximate Nash equilibria, but existing methods typically update/learn all players' strategies simultaneously. This creates a non-stationary environment that impedes RL-based best-response learning and hinders convergence to Nash equilibria. Inspired by Generalized Weakened Fictitious Play (GWFP), we propose FPDou. More specifically, to ease the use of GWFP, we adopt a perfect-training-imperfect-execution paradigm: we treat the two Peasants as one player by sharing information during training, which converts DouDizhu into a two-player zero-sum game amenable to GWFP’s analysis. To mitigate the training-execution gap, we introduce a regularization term to penalize the policy discrepancy between perfect and imperfect information. To make learning efficient, we design a practical implementation that consolidates RL and supervised learning into a single step, eliminating the need to train two separate networks. To address non-stationarity, we alternate on-policy/off-policy updates. This not only preserves stationarity for $\epsilon$-best-response learning but also enhances sample efficiency by using data for both sides. FPDou achieves a new state of the art: it uses a 3$\times$ smaller model without handcrafted features, outperforms DouZero and PerfectDou in both win rate and score, and ranks first among 452 bots on the Botzone platform. The anonymous demo and code are provided for reproducibility.
Primary Area: reinforcement learning
Submission Number: 13018
Loading