PokerSkill: Expert-Level Poker Play from Pure Language Models

PokerSkill: Expert-Level Poker Play from Pure Language Models

ACL ARR 2026 May Submission15671 Authors

26 May 2026 (modified: 02 Jun 2026)ACL ARR 2026 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: poker AI, large language models, structured prompting, imperfect-information games, decision-making, game-playing agents

Abstract: Poker is a landmark challenge for artificial intelligence. The dominant approach relies on equilibrium solvers, which require extremely high training costs. Large Language Models (LLMs) perform far below solver-based agents when asked to play poker. We introduce \textbf{PokerSkill}, a framework that unearths latent poker skills through structured prompt guidance. A deterministic context engine analyzes the current state and retrieves only the relevant fragments from a layered skill library, which is entirely designed by human poker experts, constraining the LLM's choice to reasonable actions. Against GTOWizard, a state-of-the-art GTO benchmark, GPT-5.5 XHigh with PokerSkill achieves $-57 \pm 21$ mbb/hand, Claude Opus 4.6 achieves $-80 \pm 29$ mbb/hand and Claude Opus 4.7 achieves $-87\pm 64$ mbb/hand, reducing losses by 49--63\% compared to default-prompt baselines and outperforming the strong bot Slumbot. To our knowledge, this is the first demonstration of an LLM achieving competitive performance in a complex imperfect-information game without game-specific training or solver queries at inference time. The framework is fully specified for independent replication and improves automatically as base models advance.

Paper Type: Long

Research Area: LLM agents

Research Area Keywords: planning in agents, LLM-based controllers, prompting

Contribution Types: NLP engineering experiment, Approaches to low-compute settings (efficiency), Publicly available software and/or pre-trained models

Languages Studied: English

EMNLP 2026 AI Reviewing Experiment: yes

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

Data: zip

Visa Needs: yes

Country Of Origin: CN

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: Yes

A2 Elaboration: Ethics Statement

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Section 4

B2 Discuss The License For Artifacts: No

B2 Elaboration: GTOWizard is a commercial benchmark service accessed via API; Slumbot is a publicly available research bot. Neither has a standard open-source license to discuss. Our framework code license will be specified upon release.

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: Section 4

B4 Data Contains Personally Identifying Info Or Offensive Content: N/A

B4 Elaboration: No human data is collected.

B5 Documentation Of Artifacts: Yes

B5 Elaboration: Section 3

B6 Statistics For Data: Yes

B6 Elaboration: Section 4

C Computational Experiments: Yes

C1 Model Size And Budget: No

C1 Elaboration: We use commercial LLM APIs (GPT-5.5, Claude Opus 4.6/4.7) whose parameter counts are not publicly disclosed. Section 4.1 reports per-hand API costs ($0.07-$0.30/hand) and Section 4.3 discusses cost-performance tradeoffs. No GPU compute is used by our framework.

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Section 4

C3 Descriptive Statistics: Yes

C3 Elaboration: Section 4

C4 Parameters For Packages: N/A

C4 Elaboration: No standard NLP packages are used.

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: Yes

E1 Elaboration: Ethics Statement

Author Submission Checklist: yes

Submission Number: 15671

Loading