Keywords: Multi-Agent System, LLM, Strategic Planning, Lifelong Learning
TL;DR: HexMachina is a self-evolving LLM system that learns Catan from scratch, preserves code artifacts, and refines strategies. It outperforms prompt-based agent baselines and rivals the strongest human-crafted bot, demonstrating continual learning.
Abstract: We aim to improve on the long-horizon gaps in large language model (LLM) agents by enabling them to sustain coherent strategies in adversarial, stochastic environments. Settlers of Catan provides a challenging benchmark: strategic success depends on balancing short- and long-term goals in the face of dice randomness, trading, expansion, and blocking. This is difficult because prompt-centric LLM agents (e.g., ReAct, Reflexion) must re-interpret large, evolving game states every turn, quickly saturating context windows and failing to maintain consistent strategy across episodes. We propose HexMachina, a continual learning multi-agent system that separates environment discovery (inducing an adapter layer without documentation) from strategy improvement (evolving a compiled player). This architecture preserves executable artifacts, letting the LLM focus on high-level strategy design rather than per-turn decision-making. In controlled Catanatron experiments, HexMachina learns from scratch, evolving players that outperform the strongest human-crafted baseline (AlphaBeta). Our best runs achieve a 54% win rate against AlphaBeta, outperforming prompt-driven LLM agents and shallow no-discovery baselines. Ablations further confirm that greater focus on pure strategy improves performance. Theoretically, this shows that artifact-centric continual learning can transform LLMs from brittle per-turn deciders into stable strategy designers, providing a reusable path toward long-horizon autonomy.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 20102
Loading