Model Growth Schedule learning via Optimal Path (SLOP) for Efficient LLM Pre-Training

Xue Han; Qian Hu; Yitong Wang; wenchun.gao; Qing Wang; Junlan Feng; Qicheng Li; Chao Deng

Model Growth Schedule learning via Optimal Path (SLOP) for Efficient LLM Pre-Training

Xue Han, Qian Hu, Yitong Wang, wenchun.gao, Qing Wang, Junlan Feng, Qicheng Li, Chao Deng

26 Sept 2024 (modified: 23 Jan 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Model growth, Optimal growth schedule, Efficient LLM Pre-Training

Abstract: Existing training methods for Transformer-based large language models (LLMs) rely on massive amounts of data training from scratch, which requires a high cost in terms of compute and time. Recent studies have demonstrated the great potential of improving the LLM’s training efficiency by growing from small pre-trained models to large ones—a technique known as model growth. There are two main research problems associated with model growth: growth schedule and growth operators. Existing research focuses on growth operators, detailing specific manipulations of potential dimensions to expand Transformer parameters. Few studies have investigated the optimal growth schedule, which involves integrating all possible growth operators to create an optimal multi-staged growth path. This work introduces SLOP, a growth Schedule Learning methodology via Optimal Path, for multi-stage growth of models with minimal experimental training. SLOP utilizes marginal utility as an appropriate measure for an optimal schedule that balances training costs and model performance after multi-stage growth. With this measurement, the objective of determining the optimal model growth path is converted into a dynamic programming problem, which is then addressed mathematically in polynomial time. Empirical results demonstrate SLOP's theoretical validity and show that it is an efficient approach that outperforms alternative schedules in a variety of settings.

Supplementary Material: zip

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6492

Loading