Keywords: LLM, NAS, AutoML and Genetic Algorithms.
TL;DR: Case study on neural architecture search on DistilBERT, within a limited compute budget which highlights the potential applicability of using pre-training validation loss and epoch-level stopping strategies for efficient resource allocation.
Abstract: Transformer-based language models have achieved milestones in natural language processing, but they come with challenges, mainly due to their computational footprint. Applying automated machine learning to these models can democratize their use and foster further research and development. We present a case study using neural architecture search (NAS) to optimize DistilBERT in a resource-constrained environment with a $4\,000$ GPU-hour budget. We employ an evolutionary algorithm that uses a two-level hierarchical search space and a segmented pipeline for component enhancement. While in order to obtain state-of-the-art results more compute budget is required, our results show efficient exploration, and a strong correlation between pre-training and downstream performance. This suggests a potential applicability of using pre-training validation as a cutoff criterion during model training. Finally, our learning curves analysis emphasizes the potential for efficient resource allocation through the adoption of an epoch-level stopping strategy, thus directing resources towards more promising candidate models. Future work should focus on scaling these insights to larger language models and more diverse tasks.
Submission Number: 35
Loading