Can Neural Architecture Search Help us Find Faster LLM Architectures? Experiments with GPT-2 based Text Predictor

Anonymous

Can Neural Architecture Search Help us Find Faster LLM Architectures? Experiments with GPT-2 based Text Predictor

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone

Abstract: Inference with Large Language Models (LLMs) is costly and often dominates the life-cycle cost of LLM-based services. Neural Architecture Search (NAS) can automatically find architectures optimizing the trade-offs between accuracy and inference cost. However, NAS for LLM architectures is computationally prohibitive. We apply the recently proposed LiteTransformerSearch algorithm to reduce the inference latency of a GPT-2 based Text Prediction system by 25% without compromising its accuracy. In the process, we discover some new constraints that apply on the optimal neural architectures, and are, therefore, useful in practice to further reduce the computational cost of NAS.

Paper Type: short

Research Area: Efficient/Low-Resource Methods for NLP

Contribution Types: Approaches low compute settings-efficiency

Languages Studied: English

Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.

0 Replies

Loading