MixLLM: Selecting Large Language Models with High-Quality Results and Minimum Inference Cost for Multi-Stage Complex Tasks

Mengsu Ding; Rong Zhu; Bolin Ding; Jingren Zhou

MixLLM: Selecting Large Language Models with High-Quality Results and Minimum Inference Cost for Multi-Stage Complex Tasks

Mengsu Ding, Rong Zhu, Bolin Ding, Jingren Zhou

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM selection, multi-stage tasks, high-quality results, minimum inference cost, quality-cost trade-off

TL;DR: We study the problem of selecting LLMs for multi-stage complex tasks to preserve high-quality results with minimum LLM inference cost.

Abstract: We study the problem of selecting LLMs for multi-stage complex tasks to jointly optimize final result quality and minimize LLM inference cost. Existing approaches primarily target simple tasks or optimize for improving result quality or reducing cost only, overlooking the trade-off between them. We address this gap by systematically investigating LLM performance in multi-stage workflows. Inspired by our findings from real-world applications, we formalize the LLM selection problem as a constraint-based optimization task with good properties: guarantee lower bounds on accuracy, minimize LLM inference cost and tolerate performance fluctuations caused by LLM stochasticity, making it more practical for users. We further introduce MixLLM, a search framework that leverages the exploration–exploitation principle to adaptively balance result quality and LLM inference cost. MixLLM is carefully designed to efficiently identify a (near-)optimal solution with minimal exploration and to terminate automatically and early via search-space pruning. Experimental results demonstrate that, compared to using a single powerful commercial or open-source LLM, or selecting LLMs via existing state-of-the-art methods, our approach not only improves result quality (by 1% – 16%) but also significantly reduces inference cost (by 18% – 92%). In addition, our approach efficiently adapts to different tasks, methods, and datasets, demonstrating its practicality and robustness for multi-stage complex tasks.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 11091

Loading