Less is More: Using Multiple LLMs for Applications with Lower Costs

Lingjiao Chen; Matei Zaharia; James Zou

Less is More: Using Multiple LLMs for Applications with Lower Costs

Lingjiao Chen, Matei Zaharia, James Zou

Published: 20 Jun 2023, Last Modified: 16 Jul 2023ES-FoMO 2023 PosterEveryoneRevisionsBibTeX

Keywords: LLM, inference, cloud AI market

TL;DR: We adopt multiple LLM services to match GPT-4's performance with up to 98% cost savings

Abstract: Large language models (LLMs) are increasingly used for querying purposes, but their associated costs vary significantly. This study investigates the pricing structures of popular LLM APIs, such as GPT-4, ChatGPT, and J1-Jumbo, revealing sub- stantial fee differences. To mitigate the expense of using LLMs on extensive queries and text, we propose three strategies: prompt adaptation, LLM approximation, and LLM cascade. We present FrugalGPT, an adaptable LLM cascade that in- telligently selects LLM combinations to reduce costs by up to 98% while matching or improving the accuracy of individual LLMs. This work es- tablishes a foundation for sustainable and efficient LLM utilization, offering valuable insights and practical techniques for users.

Submission Number: 54

Loading