Keywords: poker AI, large language models, structured prompting, imperfect-information games, decision-making, game-playing agents
Abstract: Poker is a landmark challenge for artificial intelligence. The dominant approach relies on equilibrium solvers, which require extremely high training costs. Large Language Models (LLMs) perform far below solver-based agents when asked to play poker. We introduce \textbf{PokerSkill}, a framework that unearths latent poker skills through structured prompt guidance. A deterministic context engine analyzes the current state and retrieves only the relevant fragments from a layered skill library, which is entirely designed by human poker experts, constraining the LLM's choice to reasonable actions. Against GTOWizard, a state-of-the-art GTO benchmark, GPT-5.5 XHigh with PokerSkill achieves $-57 \pm 21$ mbb/hand, Claude Opus 4.6 achieves $-80 \pm 29$ mbb/hand and Claude Opus 4.7 achieves $-87\pm 64$ mbb/hand, reducing losses by 49--63\% compared to default-prompt baselines and outperforming the strong bot Slumbot. To our knowledge, this is the first demonstration of an LLM achieving competitive performance in a complex imperfect-information game without game-specific training or solver queries at inference time. The framework is fully specified for independent replication and improves automatically as base models advance.
Paper Type: Long
Research Area: LLM agents
Research Area Keywords: planning in agents, LLM-based controllers, prompting
Contribution Types: NLP engineering experiment, Approaches to low-compute settings (efficiency), Publicly available software and/or pre-trained models
Languages Studied: English
EMNLP 2026 AI Reviewing Experiment: yes
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
Data: zip
Visa Needs: yes
Country Of Origin: CN
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: Ethics Statement
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 4
B2 Discuss The License For Artifacts: No
B2 Elaboration: GTOWizard is a commercial benchmark service accessed via API; Slumbot is a publicly available research bot. Neither has a standard open-source license to discuss. Our framework code license will be specified upon release.
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: Section 4
B4 Data Contains Personally Identifying Info Or Offensive Content: N/A
B4 Elaboration: No human data is collected.
B5 Documentation Of Artifacts: Yes
B5 Elaboration: Section 3
B6 Statistics For Data: Yes
B6 Elaboration: Section 4
C Computational Experiments: Yes
C1 Model Size And Budget: No
C1 Elaboration: We use commercial LLM APIs (GPT-5.5, Claude Opus 4.6/4.7) whose parameter counts are not publicly disclosed. Section 4.1 reports per-hand API costs ($0.07-$0.30/hand) and Section 4.3 discusses cost-performance tradeoffs. No GPU compute is used by our framework.
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Section 4
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 4
C4 Parameters For Packages: N/A
C4 Elaboration: No standard NLP packages are used.
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
E1 Elaboration: Ethics Statement
Author Submission Checklist: yes
Submission Number: 15671
Loading