From Scaling Law to Sub-Scaling Law: Understanding the Diminishing Returns of Larger Models

ICLR 2025 Conference Submission13598 Authors

28 Sept 2024 (modified: 13 Oct 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: scaling law, large language model
Abstract: Traditional scaling laws suggest that performance metrics of language models improve predictably with increases in model or dataset size. However, recent works display sub-scaling growth for large language models, where performance improvements decelerate as the dataset or model size increases. This study aims to systematically investigate the sub-scaling law phenomenon through an extensive empirical analysis involving over 400 models, ranging from 20 million to 7 billion parameters, with varying datasets and training strategies. Our findings indicate that sub-scaling laws arise primarily from high data density and non-optimal training resource allocations. Specifically, we observed that both factors contribute more significantly to performance deceleration than previously anticipated. We examine the sub-scaling phenomenon from two perspectives: data density and training strategy. High data density leads to diminishing marginal gains in performance, while optimal resource allocation is crucial for sustaining performance improvements. Further, we propose a sub-optimal scaling law that generalizes the Chinchilla scaling law to better predict performance and loss in sub-scaling regimes.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13598
Loading