Branches Switching Based Pre-Training Strategy for Version Iteration of Large Language Models

Anonymous

Branches Switching Based Pre-Training Strategy for Version Iteration of Large Language Models

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Due to the continuous emergence of online data, version iteration has become an indispensable requirement for Large Language Models (LLMs), which exacerbates the training cost of LLMs. Hence, one of the pivotal challenges for LLMs is how to reduce the total training cost across different versions. To achieve a better balance between the pre-training performance and training cost, we conduct a systematic investigation into the impact of various learning rate schedules. Extensive experiments on commonly used learning rate schedules show that these approaches primarily focus on the performance of LLMs of the current version, but overlook the mutual influence of training processes of LLMs across different versions. To address the above issue, we design a pre-training strategy called Branches Switching based Pre-Training for the training of LLMs across different versions. Compared with pre-training LLMs of different versions from scratch, our strategy reduces the total training cost to 58\% while maintaining optimal pre-training performance.

Paper Type: long

Research Area: Efficient/Low-Resource Methods for NLP

Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency

Languages Studied: English

0 Replies

Loading