Charting the Frontier: How Optimizing Performance Yields Accurate Scaling Laws on a Shoestring

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Successive Halving, Scaling Laws, Budget Allocation, Gaussian Processes
TL;DR: Using a budget allocation strategy like SH or SH paired with surrogate models enables us to obtain accurate scaling laws at significantly reduced costs.
Abstract: Predicting model performance at larger scales enables the design of training strategies and architectures tailored to specific performance targets. Empirical scaling law research identifies functional forms to aid this prediction task. These describe the relationship between loss and compute using a loss-compute frontier defined by learning curves. Due to the empirical nature of this approach, the computational burden is substantial, making strategic resource allocation essential -- yet it remains surprisingly underexplored. In this work, we address this shortcoming by exploring the suitability of Successive Halving (SH) and SH combined with parametric and non-parametric surrogate models. In addition to enabling a more systematic allocation of a given compute budget, our findings show that SH paired with surrogate models yields a set of learning curves that includes one with a lower loss-compute value than what naive uniform allocation or an SH-only approach can obtain. Our experiments demonstrate mean relative improvements of up to $2.84$% and $5.47$% on real-world and synthetic learning curve datasets. This strategic resource allocation enables us to obtain accurate scaling laws at significantly reduced computational costs, saving up to $98.7$% over the traditional exhaustive approach.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 15106
Loading