The Expanded Othello AI Arena: Evaluating Intelligent Systems Through Constrained Adaptation to Unseen Conditions
Abstract: The ability to rapidly adapt to environmental changes is a core requirement for Artificial General Intelligence (AGI), yet most AI benchmarks evaluate performance in static environments. We present the Expanded Othello AI Arena, a benchmark designed to measure Skill-Acquisition Efficiency — the rate at which agents discover latent objectives and converge to effective strategies within a limited interaction budget. The Arena formalizes a spectrum of 56 environments using a parametric framework $\mathcal{E} = (\mathcal{L}, \mathcal{C})$, where $\mathcal{L}$ defines Othello board geometries and $\mathcal{C}$ represents latent winning conditions via a disc-ratio threshold $K$. This parameterization requires agents to decipher terminal rules through direct interaction while simultaneously interpreting the opponent's behavior — in narrow regimes, agents must strategically induce the opponent into violating the hidden threshold to secure victory. Unlike traditional evaluation, the Arena imposes a strict 2,000-game interaction budget to prioritize sample efficiency over asymptotic optimization. We establish the benchmark's utility through a neuroevolutionary adaptive-Minimax baseline that utilizes meta-learned spatial priors and adaptive weighting. Our empirical analysis reveals that while this baseline achieves competitive performance in standard and inverse regimes, it fails in narrow-interval regimes that demand adversarial inducement, exposing a substantial efficiency gap that gradient-based reinforcement learning cannot bridge even with five times the interaction budget. Released as an extensible Python-based research toolkit, the Arena provides a standardized platform for exploring research directions including test-time learning, in-context learning, and world models. The code is available at: \url{https://anonymous.4open.science/r/ExpandedOthello/}.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Tim_Genewein1
Submission Number: 7510
Loading