Escaping The Plateau: Dynamic Context Length Adaptation for Efficient BERT PretrainingDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: Usage of dynamic context length to speed up convergence time
Abstract: We present a technique for dynamically shortening the pretraining time of BERT-based models. BERT-based models are a popular choice for pretraining research on a low budget. However improvements can still be made to further lower monetary and time investments. We propose an approach that dynamically shortens the context length when a plateau, a region of slow loss reduction rate, is detected, then returns to the original value after the plateau is escaped. We show that this change forces an abrupt exit from the plateau, which reduces the time it takes to reach 90% of the final baseline performance by a factor of 2.
Paper Type: short
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
0 Replies

Loading