Keywords: pre-training, model familly, compute efficiency
TL;DR: We propose a progressive training approach that efficiently builds a family of LLMs, reducing total computational requirements while achieving comparable or even better performance.
Abstract: As Large Language Models (LLMs) gain widespread practical applica-
tion, offering model families with varying parameter sizes has become
standard practice to accommodate diverse computational requirements.
Traditionally, each model in the family is trained independently, incurring
computational costs that scale additively with the number of models. In
this work, we propose an efficient method for constructing model families
via progressive training, where smaller models are incrementally expanded
to larger sizes to create a complete model family. Through extensive ex-
periments on a model family ranging from 1B to 8B parameters, we show
that our approach reduces total computational cost by approximately 25%
while maintaining comparable performance to independently trained mod-
els. Moreover, by strategically adjusting the maximum learning rate based
on model size, our method outperforms the independent training across
various metrics. Beyond these improvements, our approach also fosters
greater consistency in behavior across model sizes.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 173
Loading