Abstract: The rapid adoption of large language models (LLMs) has led to a growing number of com- panies offering generative LLMs as callable services at varying costs. We find that popular generative LLM APIs, such as GPT-4, ChatGPT, and J1-Jumbo, exhibit heterogeneous pricing structures, with fees that can differ by two orders of magnitude and heterogeneous performance across tasks and input queries. This makes it challenging for users to decide which generative LLM APIs to utilize for their applications and budget. Motivated by these findings, we propose FrugalGPT, an algorithmic framework that adaptively selects which generative LLMs to use for different queries to reduce cost and improve accuracy. Our experiments demonstrate that, for a range of natural language tasks including news classi- fication, reading comprehension, and scientific question answering, FrugalGPT can match the performance of the best individual generative LLM (e.g., GPT-4) with up to a 98% cost reduction or improve the accuracy over GPT-4 by 4% at the same cost. The ideas and findings presented in this paper lay a foundation for using LLMs sustainably and efficiently.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Yu-Xiong_Wang1
Submission Number: 2934
Loading