FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

Lingjiao Chen; Matei Zaharia; James Zou

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

Lingjiao Chen, Matei Zaharia, James Zou

Published: 20 Dec 2024, Last Modified: 20 Dec 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The rapid adoption of large language models (LLMs) has led to a growing number of companies offering generative LLMs as callable services at varying costs. We find that popular generative LLM APIs, such as GPT-4, Gemini 1.5, and Claude 3.5, exhibit heterogeneous pricing structures, with fees that can differ by two orders of magnitude and heterogeneous performance across tasks and input queries. This makes it challenging for users to decide which generative LLM APIs to utilize for their applications and budget. Motivated by these findings, we propose FrugalGPT, an algorithmic framework that adaptively selects which generative LLMs to use for different queries to reduce cost and improve accuracy. Our experiments demonstrate that, for a range of natural language tasks including news classification, reading comprehension, and scientific question answering, FrugalGPT can match the performance of the best individual generative LLM (e.g., GPT-4) with up to a 98% cost reduction or improve the accuracy over GPT-4 by 4% at the same cost. The ideas and findings presented in this paper lay a foundation for using LLMs sustainably and efficiently.

Submission Length: Regular submission (no more than 12 pages of main content)

Supplementary Material: pdf

Assigned Action Editor: ~Yu-Xiong_Wang1

Submission Number: 2934

Loading