Leaderboard Incentives: Model Rankings under Strategic Post-Training

Published: 02 Mar 2026, Last Modified: 20 Mar 2026ICLR 2026 Workshop AIMSEveryoneRevisionsCC BY 4.0
Keywords: benchmark leaderboard, game theory, strategic classification, mechanism design
Abstract: Influential benchmarks incentivize competing model providers to strategically allocate post-training resources towards improvements on the leaderboard, a phenomenon dubbed \emph{benchmaxxing} or \emph{training on the test task}. In this work, we initiate a principled study of the incentive structure that benchmarks induce. We model benchmarking as a Stackelberg game between a benchmark designer who chooses an evaluation protocol and multiple model providers who compete simultaneously in a subgame given by the designer’s choice. Each competitor has a model of unknown latent quality and can inflate its observed score by allocating resources to benchmark-specific improvements. First, we prove that current benchmarks induce games for which no Nash equilibrium between model developers exists. This result suggests one explanation for why current practice leads to misaligned incentives, prompting model providers to strategize in opaque ways. However, we prove that under mild conditions, a recently proposed evaluation protocol, called tune-before-test, induces a benchmark with a unique Nash equilibrium that ranks models by latent quality. This positive result demonstrates that benchmarks need not set bad incentives, even if current evaluations do.
Track: Long Paper
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 55
Loading