Down-Scaling Language Models in the Era of Scale Is All You Need

Anonymous

Down-Scaling Language Models in the Era of Scale Is All You Need

Anonymous

16 Jul 2022 (modified: 05 May 2023)ACL ARR 2022 July Blind SubmissionReaders: Everyone

Abstract: Large language models are very resource intensive, both financially and environmentally, and require a huge amount of training data, which is only available to a small number of languages. In this work, we put the focus on low resource settings. We build language models in two languages trained with different configurations, which are then evaluated on several NLP tasks. Specifically, we analyze three lightweight BERT architectures (with 124M, 51M, and 16M parameters) which are trained with small corpora (125M, 25M, 5M words) for both Basque and Spanish languages. The trained models are evaluated on several tasks, and compared with traditional, non-neural supervised systems. We also present an estimate of resources and CO$_2$ emissions needed in each approach, which asks for a compromise between raw performance and environmental costs.

Paper Type: short

0 Replies

Loading