Making Transformers Learn Faster by Taking Notes on the Fly

03 Feb 2023 (modified: 02 May 2023)Submitted to Blogposts @ ICLR 2023Readers: Everyone
Abstract: Transformers have become the cornerstone of major machine learning domains, such as Natural Language Processing (NLP) and Computer Vision (CV). In NLP, transformers achieve superior performance since they enabled training on a massive amount of unlabeled data. However, it is known that they cannot efficiently utilize the dataset because of the heavy-tailed distribution of words in natural language corpora. In particular, a large proportion of the words appear only a few times in a text, leading to noisy embeddings of such words. This is one of the major reasons why they require a vast amount of time and resources to train. The authors of the paper propose a lightweight memory-based strategy to better optimize the embeddings of rare words. During training, a dictionary that maps the rare words to their contextual information is kept and updated as rare words occur. Their method achieves a 60% improvement in pretraining time while reaching the same performance as the baseline training strategy.
Blogpost Url: https://iclr-blogposts.github.io/staging/blog/2023/taking_notes_on_the_fly
ICLR Papers: https://arxiv.org/abs/2008.01466
ID Of The Authors Of The ICLR Paper: ~Qiyu_Wu1
Conflict Of Interest: No
4 Replies

Loading