Efficient Construction of Model Family through Progressive Training Using Model Expansion

Published: 08 Jul 2025, Last Modified: 26 Aug 2025COLM 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: pre-training, model familly, compute efficiency
TL;DR: We propose a progressive training approach that efficiently builds a family of LLMs, reducing total computational requirements while achieving comparable or even better performance.
Abstract: As Large Language Models (LLMs) gain widespread practical applica- tion, offering model families with varying parameter sizes has become standard practice to accommodate diverse computational requirements. Traditionally, each model in the family is trained independently, incurring computational costs that scale additively with the number of models. In this work, we propose an efficient method for constructing model families via progressive training, where smaller models are incrementally expanded to larger sizes to create a complete model family. Through extensive ex- periments on a model family ranging from 1B to 8B parameters, we show that our approach reduces total computational cost by approximately 25% while maintaining comparable performance to independently trained mod- els. Moreover, by strategically adjusting the maximum learning rate based on model size, our method outperforms the independent training across various metrics. Beyond these improvements, our approach also fosters greater consistency in behavior across model sizes.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 173
Loading