Interpreting Emergent Military Tactics in a General AlphaZero Framework

ICLR 2026 Conference Submission18455 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, alphazero, military
TL;DR: We use AlphaZero to discover military tactics for battlefield scenario's
Abstract: This paper presents an approach that combines AlphaZero with convolutional and transformer-based neural network architectures to learn strategies in battlefield-inspired gridworld games. These games are designed to balance realism with rapid outcomes, featuring multiple agents organized into competing teams. To encourage effective coordination among agents, we investigate different reward shaping methods and evaluate their impact on emergent teamwork. The learned strategies are analyzed on a tactical level, in an attempt to reveal insights into multi-agent collaboration and competitive behavior. In particular, the framework provides a testbed for studying how military-style strategies can emerge from self-play. Through a series of comparative studies, we further break down the contributions of architectural components and training methodologies to demonstrate the effectiveness of this approach for decision-making in dynamic adversarial settings.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 18455
Loading