Less is More: Using Multiple LLMs for Applications with Lower Costs
Keywords: LLM, inference, cloud AI market
TL;DR: We adopt multiple LLM services to match GPT-4's performance with up to 98% cost savings
Abstract: Large language models (LLMs) are increasingly used for querying purposes, but their associated costs vary significantly. This study investigates the pricing structures of popular LLM APIs, such as GPT-4, ChatGPT, and J1-Jumbo, revealing sub- stantial fee differences. To mitigate the expense of using LLMs on extensive queries and text, we propose three strategies: prompt adaptation, LLM approximation, and LLM cascade. We present FrugalGPT, an adaptable LLM cascade that in- telligently selects LLM combinations to reduce costs by up to 98% while matching or improving the accuracy of individual LLMs. This work es- tablishes a foundation for sustainable and efficient LLM utilization, offering valuable insights and practical techniques for users.
Submission Number: 54