FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

Published: 20 Dec 2024, Last Modified: 20 Dec 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The rapid adoption of large language models (LLMs) has led to a growing number of companies offering generative LLMs as callable services at varying costs. We find that popular generative LLM APIs, such as GPT-4, Gemini 1.5, and Claude 3.5, exhibit heterogeneous pricing structures, with fees that can differ by two orders of magnitude and heterogeneous performance across tasks and input queries. This makes it challenging for users to decide which generative LLM APIs to utilize for their applications and budget. Motivated by these findings, we propose FrugalGPT, an algorithmic framework that adaptively selects which generative LLMs to use for different queries to reduce cost and improve accuracy. Our experiments demonstrate that, for a range of natural language tasks including news classification, reading comprehension, and scientific question answering, FrugalGPT can match the performance of the best individual generative LLM (e.g., GPT-4) with up to a 98% cost reduction or improve the accuracy over GPT-4 by 4% at the same cost. The ideas and findings presented in this paper lay a foundation for using LLMs sustainably and efficiently.
Submission Length: Regular submission (no more than 12 pages of main content)
Supplementary Material: pdf
Assigned Action Editor: ~Yu-Xiong_Wang1
Submission Number: 2934
Loading