Beyond Reasoning: RL-Policy Guided LLM Inference for Efficient Strategy in Liar’s Poker

Richard Dewey; János Botyánszki; Ciamac C. Moallemi; Andrew Zheng

Beyond Reasoning: RL-Policy Guided LLM Inference for Efficient Strategy in Liar’s Poker

Richard Dewey, János Botyánszki, Ciamac C. Moallemi, Andrew Zheng

Published: 02 Mar 2026, Last Modified: 20 Apr 2026MALGAIEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi-agent, self-play, reinforcement learning, large language models

TL;DR: We introduce Solly, the first AI agent to play Liar's Poker (a multi-player, imperfect information game) at the elite human level and use it to probe RL-policy guided LLM inference

Abstract: AI researchers have long focused on poker-like games as a testbed for environments characterized by multi-player dynamics, imperfect information, and reasoning under uncertainty. While recent breakthroughs have matched elite human play at no-limit Texas hold'em, the multi-player dynamics are subdued; most hands converge quickly with only two players engaged through multiple rounds of bidding. In this paper, we present Solly, the first AI agent to achieve elite human play in reduced-format Liar’s Poker, a game characterized by extensive multi-player engagement. We trained Solly using self-play with a model-free, actor-critic, deep reinforcement learning (RL) algorithm. Solly played at an elite human level as measured by win rate (won over 50\% of hands) and equity (money won) in heads-up and multi-player Liar’s Poker. Solly also outperformed frontier large language models (LLMs) on the same metrics. In one extension, when Solly's policies were provided to LLMs as a domain-specific guide for inference, LLM performance improved and token costs decreased. This result suggests that low-cost RL agents can be used to boost LLM efficiency and reasoning capabilities.

Submission Number: 79

Loading