Escaping The Plateau: Dynamic Context Length Adaptation for Efficient BERT Pretraining

Anonymous

Escaping The Plateau: Dynamic Context Length Adaptation for Efficient BERT Pretraining

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: Usage of dynamic context length to speed up convergence time

Abstract: We present a technique for dynamically shortening the pretraining time of BERT-based models. BERT-based models are a popular choice for pretraining research on a low budget. However improvements can still be made to further lower monetary and time investments. We propose an approach that dynamically shortens the context length when a plateau, a region of slow loss reduction rate, is detected, then returns to the original value after the plateau is escaped. We show that this change forces an abrupt exit from the plateau, which reduces the time it takes to reach 90% of the final baseline performance by a factor of 2.

Paper Type: short

Research Area: Efficient/Low-Resource Methods for NLP

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

0 Replies

Loading