The Hawk Effect: Why We Need a Two-Dimensional Measure of Machine Intelligence

Published: 04 Jun 2026, Last Modified: 04 Jun 2026PhilML@ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Definitions of Intelligence, AI Evaluation, AI Alignment, Epistemic Intelligence, Reinforcement Learning
TL;DR: Our work explores "the hawk effect" - the possibility of training adversarial policies against powerful self-play models and its consequences, in particular necessitating an at least 2 dimensional definition of "intelligence" .
Abstract: Deep reinforcement-learning agents achieve superhuman performance in combinatorially complex domains, yet can collapse when subjected to specialized, dynamically trained adversarial policies---a phenomenon we term the \emph{Hawk Effect}. We argue that this vulnerability is not merely an empirical anomaly but a \emph{structural risk} associated with finite-capacity deployed systems under adaptive attack, and that it forces a philosophical question: what should ``intelligence'' mean for an ML system whose cheap-to-train adversaries can route it from triumph to catastrophe in a few moves? We propose treating deployed machine intelligence as at least two-dimensional, with one axis capturing domain-normalized standard-task competence (Hutter-inspired) and a second axis capturing adaptive adversarial work-to-failure (Pearl-inspired). We propose the \emph{Adversarial Work Criterion} (AWC) as a concrete first instantiation of the second axis. The paper is intentionally argumentative; we devote a section to alternative views.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 16
Loading