HexMachina: Self-Evolving Multi-Agent System for Continual Learning of Catan

HexMachina: Self-Evolving Multi-Agent System for Continual Learning of Catan

ICLR 2026 Conference Submission20102 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Agent System, LLM, Strategic Planning, Lifelong Learning

TL;DR: HexMachina is a self-evolving LLM system that learns Catan from scratch, preserves code artifacts, and refines strategies. It outperforms prompt-based agent baselines and rivals the strongest human-crafted bot, demonstrating continual learning.

Abstract: We aim to improve on the long-horizon gaps in large language model (LLM) agents by enabling them to sustain coherent strategies in adversarial, stochastic environments. Settlers of Catan provides a challenging benchmark: strategic success depends on balancing short- and long-term goals in the face of dice randomness, trading, expansion, and blocking. This is difficult because prompt-centric LLM agents (e.g., ReAct, Reflexion) must re-interpret large, evolving game states every turn, quickly saturating context windows and failing to maintain consistent strategy across episodes. We propose HexMachina, a continual learning multi-agent system that separates environment discovery (inducing an adapter layer without documentation) from strategy improvement (evolving a compiled player). This architecture preserves executable artifacts, letting the LLM focus on high-level strategy design rather than per-turn decision-making. In controlled Catanatron experiments, HexMachina learns from scratch, evolving players that outperform the strongest human-crafted baseline (AlphaBeta). Our best runs achieve a 54% win rate against AlphaBeta, outperforming prompt-driven LLM agents and shallow no-discovery baselines. Ablations further confirm that greater focus on pure strategy improves performance. Theoretically, this shows that artifact-centric continual learning can transform LLMs from brittle per-turn deciders into stable strategy designers, providing a reusable path toward long-horizon autonomy.

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 20102

Loading