Efficient Training of Language Models using Few-Shot Learning

Sashank J. Reddi; Sobhan Miryoosefi; Stefani Karp; Shankar Krishnan; Satyen Kale; Seungyeon Kim; Sanjiv Kumar

Efficient Training of Language Models using Few-Shot Learning

Sashank J. Reddi, Sobhan Miryoosefi, Stefani Karp, Shankar Krishnan, Satyen Kale, Seungyeon Kim, Sanjiv Kumar

Published: 24 Apr 2023, Last Modified: 21 Jun 2023ICML 2023 PosterEveryoneRevisions

Abstract: Large deep learning models have achieved state-of-the-art performance across various natural language processing (NLP) tasks and demonstrated remarkable few-shot learning performance. However, training them is often challenging and resource-intensive. In this paper, we study an efficient approach to train language models using few-shot learners. We show that, by leveraging the fast learning nature of few-shot learners, one can train language models efficiently in a stagewise manner. Our main insight is that stacking a good few-shot learner on a good small language model provides a good initializer for a larger language model. Using this insight and building upon progressive stacking approaches, we develop novel approaches for training such networks in a stagewise manner. Furthermore, we also provide a theoretical framework and accompanying empirical studies to support our insights, thereby creating a theoretical foundation for progressive stacking. Finally, we provide empirical results to demonstrate the effectiveness of our approach in reducing the training time of few-shot learners.

Submission Number: 4806

Loading