MixLLM: Selecting Large Language Models with High-Quality Results and Minimum Inference Cost for Multi-Stage Complex Tasks
Keywords: LLM selection, multi-stage tasks, high-quality results, minimum inference cost, quality-cost trade-off
TL;DR: We study the problem of selecting LLMs for multi-stage complex tasks to preserve high-quality results with minimum LLM inference cost.
Abstract: We study the problem of selecting LLMs for multi-stage complex tasks to jointly
optimize final result quality and minimize LLM inference cost. Existing approaches primarily target simple tasks or optimize for improving result quality
or reducing cost only, overlooking the trade-off between them. We address this
gap by systematically investigating LLM performance in multi-stage workflows.
Inspired by our findings from real-world applications, we formalize the LLM
selection problem as a constraint-based optimization task with good properties:
guarantee lower bounds on accuracy, minimize LLM inference cost and tolerate
performance fluctuations caused by LLM stochasticity, making it more practical
for users. We further introduce MixLLM, a search framework that leverages the
exploration–exploitation principle to adaptively balance result quality and LLM
inference cost. MixLLM is carefully designed to efficiently identify a (near-)optimal solution with minimal exploration and to terminate automatically and
early via search-space pruning. Experimental results demonstrate that, compared
to using a single powerful commercial or open-source LLM, or selecting LLMs
via existing state-of-the-art methods, our approach not only improves result quality (by 1% – 16%) but also significantly reduces inference cost (by 18% – 92%). In addition, our approach efficiently adapts to different tasks, methods, and datasets,
demonstrating its practicality and robustness for multi-stage complex tasks.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 11091
Loading