Evaluating the Robustness of Chinchilla Compute-Optimal Scaling

Rylan Schaeffer; Noam Itzhak Levi; Andreas Kirsch; Theo Guenais; Brando Miranda; Elyas Obbad; Sanmi Koyejo

Evaluating the Robustness of Chinchilla Compute-Optimal Scaling

Rylan Schaeffer, Noam Itzhak Levi, Andreas Kirsch, Theo Guenais, Brando Miranda, Elyas Obbad, Sanmi Koyejo

13 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: scaling laws, compute-optimal scaling, language models, large language models, robustness analysis

TL;DR: We investigate the robustness of Chinchilla compute-optimal scaling for language models

Abstract: Hoffman et al. (2022)'s Chinchilla paper introduced the principle of compute-optimal scaling, laying a foundation for future scaling of language models. In the years since, however, valid concerns about Chinchilla have been raised: wide confidence intervals, discrepancies between its three approaches, and incongruities with other scaling laws. This raises a critical question for the field: Can practitioners still rely on Chinchilla's prescriptions? Our work demonstrates the answer is yes. We begin by uncovering that the model parameters central to Chinchilla's analyses were ambiguous: three interpretations are possible, with relative differences between different interpretations of model parameters as high as 15.2\%. We find that, perhaps surprisingly, which model parameters are used for the analyses do not meaningfully affect key results: the scaling law estimates and the compute-optimal tokens-to-parameter ratio. Indeed, under one interpretation, the tokens-to-parameter ratio becomes more constant with the target compute budget. We then ask how distorted the Chinchilla model parameters \textit{could} have been without meaningfully affecting the key results. By deliberately perturbing model parameters in four structured ways, we find that key Chinchilla results are most sensitive to additive or systematic errors, which can alter the otherwise flat trend of the optimal tokens-to-parameter ratio, but overall, Chinchilla's key results withstand sizable perturbations. Altogether, our findings offer the field renewed confidence in Chinchilla as a durable guide for scaling language models.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 4902

Loading