The Expanded Othello AI Arena: Evaluating Intelligent Systems Through Constrained Adaptation to Unseen Conditions

14 Feb 2026 (modified: 10 May 2026)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The ability to rapidly adapt to environmental changes is a core requirement for Artificial General Intelligence (AGI), yet most AI benchmarks evaluate performance in static environments. We present the Expanded Othello AI Arena, a benchmark designed to measure Skill-Acquisition Efficiency: the rate at which agents discover latent objectives and converge to effective strategies within a limited interaction budget. The Arena formalizes a spectrum of 56 environments using a parametric framework $\mathcal{E} = (\mathcal{L}, \mathcal{C})$, where $\mathcal{L}$ defines Othello board geometries and $\mathcal{C}$ represents latent winning conditions via a disc-ratio threshold $K$. This parameterization requires agents to decipher terminal rules through direct interaction while adapting against an opponent in a zero-sum setting; in narrow regimes, agents must precisely control their terminal occupancy under the hidden threshold, while terminal outcomes in which neither player satisfies the admissible interval are treated as draws. Unlike traditional evaluation, the Arena imposes a strict interaction budget to prioritize sample efficiency over asymptotic optimization. We establish the benchmark's utility through a neuroevolutionary adaptive-Minimax baseline that utilizes meta-learned spatial priors and adaptive weighting. Our empirical analysis reveals that while this baseline achieves competitive performance in standard and inverse regimes, it struggles in narrow-interval regimes that demand precise terminal control under latent objectives. Released as an extensible Python-based research toolkit, the Arena provides a standardized platform for exploring research directions, including test-time learning, in-context learning, and world models. The code is available at: \url{https://anonymous.4open.science/r/ExpandedOthello/}
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Tim_Genewein1
Submission Number: 7510
Loading