Gradient-free Proxy for Efficient Language Model Search

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: gradient-free, neural architecture search, efficient language model
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: The rising demand for efficient natural language processing (NLP) systems has underscored the significance of developing lightweight language models. However, prevailing approaches to neural architecture search (NAS) often confront issues such as biased evaluation metrics and computational inefficiencies. This paper introduces weight-weighted PCA (W-PCA), a novel zero-shot NAS method specifically tailored for lightweight language models. Our approach leverages two evaluation proxies, namely parameter count and principal component analysis (PCA) value of the feed-forward neural (FFN) layer, to provide a comprehensive and unbiased assessment. Additionally, by eliminating the need for gradient computations, we optimize the evaluation time, thereby enhancing the efficiency of designing and evaluating lightweight language models. Comparative analysis on the GLUE and SQuAD datasets demonstrates that our approach not only significantly reduces training time compared to one-shot NAS methods but also achieves higher scores in the testing phase compared to previous state-of-the-art training-based methods. Furthermore, ranking evaluations on a dataset sampled from the FlexiBERT search space reveal that our approach exhibits superior ranking correlation and further reduces solving time compared to other zero-shot NAS methods that require gradient computation.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3977
Loading