From Scaling Law to Sub-Scaling Law: Understanding the Diminishing Returns of Larger Models

Zhengyu Chen; Siqi Wang; Teng Xiao; Yudong Wang; Shiqi Chen; Xunliang Cai; Junxian He; Jingang Wang

From Scaling Law to Sub-Scaling Law: Understanding the Diminishing Returns of Larger Models

Zhengyu Chen, Siqi Wang, Teng Xiao, Yudong Wang, Shiqi Chen, Xunliang Cai, Junxian He, Jingang Wang

28 Sept 2024 (modified: 18 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: scaling law, large language model

Abstract: Traditional scaling laws suggest that performance metrics of language models improve predictably with increases in model or dataset size. However, recent works display sub-scaling growth for large language models, where performance improvements decelerate as the dataset or model size increases. This study aims to systematically investigate the sub-scaling law phenomenon through an extensive empirical analysis involving over 400 models, ranging from 20 million to 7 billion parameters, with varying datasets and training strategies. Our findings indicate that sub-scaling laws arise primarily from high data density and non-optimal training resource allocations. Specifically, we observed that both factors contribute more significantly to performance deceleration than previously anticipated. We examine the sub-scaling phenomenon from two perspectives: data density and training strategy. High data density leads to diminishing marginal gains in performance, while optimal resource allocation is crucial for sustaining performance improvements. Further, we propose a sub-optimal scaling law that generalizes the Chinchilla scaling law to better predict performance and loss in sub-scaling regimes.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13598

Loading