SELECTLLM: A Framework for Quality Aware Cost Efficient LLM Usage

ACL ARR 2024 June Submission4027 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Generative AI and LLMs in particular are heavily used nowadays for various document processing tasks such as question answering and document summarization. Enterprises are incurring huge costs of operating or using LLMs for their respective use cases. In this work, we propose optimizing the usage costs of LLMs in a quality aware manner for document summarization tasks. Specifically, we propose to exploit the variability of LLM performances across different types and formats of data to maximize the output quality while maintaining expected costs under a budget and latency within a threshold. This presents two challenges: 1) estimating the output quality of LLMs at runtime without invoking each LLM, 2) optimally allocating queries to LLMs such that the objectives are optimized and constraints are satisfied. We propose a model to predict the output quality of LLMs on text summarization, followed by an LP rounding algorithm to optimize the selection of LLMs. We study the problems both theoretically and empirically. Our methods reduce costs by $40\%- 90\%$ while improving quality by $4\%-7\%$. In addition to the quantitative results, we further show that our model quality estimation aligns majorly with human preferences through a user study. We release the annotated open source datasets to the community for further research and exploration.
Paper Type: Long
Research Area: Special Theme (conference specific)
Research Area Keywords: Cost efficiency, LLM selection, Quality estimation
Contribution Types: Approaches to low-resource settings, Data resources, Theory
Languages Studied: English
Submission Number: 4027
Loading