A unified metric of generalization across humans and machines

TMLR Paper6637 Authors

25 Nov 2025 (modified: 01 Dec 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Generalization means performing well on situations that differ from those seen during learning. Accuracy alone cannot tell whether a system truly generalizes, because a model can be correct yet fragile or misaligned with the structure of a task \cite{entry1}. We introduce $\mathrm{GR}^{\star}$, a single reproducible metric that measures not only performance but also stability and structural alignment while accounting for data, scale, and abstraction cost. $\mathrm{GR}^{\star}$ is designed to be simple, deterministic, and fair across humans and machines, allowing both to be compared under the same coordinate system. All evaluations follow a lightweight standardized pipeline with fixed hyperparameters and no distributed training, ensuring transparency and reproducibility. This work turns generalization from an abstract concept into a measurable and falsifiable property, offering a unified and interpretable way to understand how different systems learn. Code: \url{https://github.com/JerryHuang20030919/GR_Star_Unified_Metric}.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Seungjin_Choi1
Submission Number: 6637
Loading