Marginal Information Statements: Low-Rank Disclosure for Crowded Benchmark Suites

Marginal Information Statements: Low-Rank Disclosure for Crowded Benchmark Suites

08 May 2026 (modified: 11 May 2026)ICML 2026 Workshop CoLoRAI SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: benchmark evaluation, low-rank diagnostics, marginal information, ranking novelty, dimensionality reduction, model interpretability, position paper

TL;DR: MIS provides three low-rank benchmark diagnostics—DO, DD, FDR—quantifying what a new benchmark adds beyond an existing reference suite via score-snapshot and item-level evidence.

Abstract: Benchmark papers are reviewed one at a time, but their value depends on what they add to an already crowded evaluation suite. The Marginal Information Statement (MIS) is a one-page disclosure for benchmark submissions. Given a declared reference set R, MIS asks authors to report three numbers: Diagnostic Overlap, whether item-level failures are already predicted by R; Differentiation Degradation, whether the benchmark changes model rankings; and Frontier Discrimination Resolution, whether frontier-model gaps exceed reported noise. DD and FDR are computable from public score snapshots; DO requires item-level outputs and motivates a release expectation. MIS is not a rejection threshold. It forces the marginal claim of a benchmark to be stated, quantified, and checked. An illustrative score-snapshot exercise across knowledge, reasoning, code, multimodal, agent, and safety subdomains demonstrates how ranking novelty, frontier resolution, and item-level availability come apart: at predeclared point-estimate cutoffs, only one of seven candidate-reference comparisons lies in the high-DD/high-FDR quadrant. The seven case studies use a frozen illustrative model roster and per-cell evidence pointers; the provenance scaffold, not the specific accuracy numbers, is the contribution being demonstrated. We release a Python implementation, a per-cell score manifest with snapshot SHA256, and a reporting template.

Submission Number: 135

Loading