Keywords: language agent, self play, strategic adversarial games
Abstract: Language agents often struggle with strategic reasoning in adversarial games.
A promising approach is learning from game interactions automatically, but unlike static environments, selecting appropriate opponents in adversarial settings significantly impacts learning—a factor that remains underexplored.
We propose **S**tep-level poli**C**y **O**ptimization through **P**lay-**A**nd-**L**earn (SCO-PAL), and conduct systematic analysis of opponent selection, finding that self-play is most effective for improving strategic reasoning.
With SCO-PAL and self-play, we improve the average win rate from 32.17\% (base model) to 50.08\%, achieving 54.76\% against GPT-4 across six games.
The learned skills also generalize to unseen games and broader reasoning tasks, demonstrating the unique advantages of LLM-based agents.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: language agent, self play, strategic adversarial games
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 9602
Loading