Scaling Laws: A Model-Based Optimization Perspective

Scaling Laws: A Model-Based Optimization Perspective

03 Apr 2026 (modified: 04 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Scaling laws have become indispensable for guiding the pre-training of large language models, enabling optimal decision-making---such as determining the scale of model and data---under a fixed compute budget. Standard practice involves fitting parametric functions~(predominantly power laws) from small-scale experiments, which allows researchers to extrapolate trends and predict compute-optimal configurations at larger scales. This neural scaling paradigm is fundamentally a specialized instantiation of \textit{Model-Based Optimization} (MBO): constructing a surrogate model (the scaling law) from experimental data to predict validation metrics, and subsequently optimizing pre-training configurations as design variables against this surrogate. Despite this equivalence, existing literature primarily focuses on neural scaling priors while neglecting the broader MBO perspective. In this position paper, we bridge this gap by formally mapping the neural scaling paradigm to the three stages of MBO: \textit{design space}, \textit{surrogate modeling}, and \textit{guided optimization}, and distinguish the three unique characteristics---low-dimensional spaces, strong power-law priors, and strict compute constraints---that separate it from standard MBO problems. Furthermore, we systematically partition the design space into three subspaces: \textit{model}, \textit{data}, and \textit{hyperparameters}. Crucially, we formalize their relationship as a bi-level optimization problem, wherein hyperparameters are optimized at the lower level to ensure convergence for specific model and data configurations. To demonstrate the practical utility of adopting MBO techniques, we focus on the surrogate modeling stage and provide an illustrative proof-of-concept by applying \textit{autofocus}---an established MBO technique---to mitigate extrapolation-induced covariate shifts. Finally, we conclude by providing a principled roadmap for future research, highlighting uncertainty quantification and multi-objective optimization.

Submission Type: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Vidya_Muthukumar3

Submission Number: 8242

Loading