LLMSELECTOR: Learning to Select Models in Compound AI Systems

Published: 10 Jun 2025, Last Modified: 29 Jun 2025CFAgentic @ ICML'25 OralEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
Keywords: LLM; agentic AI; compound AI systems; model selection
TL;DR: We study how to optimize model selection in compound AI systems
Abstract: Compound AI systems that combine multiple LLM calls, such as Self-Refine and Multiagent-Debate, are increasingly critical to AI advancements. Perhaps surprisingly, we find empirically that choosing different models for different modules has a substantial effect on these systems’ performance. Thus, we ask a core question in compound AI systems: for each LLM call or module in the system, how should one decide which LLM to use? As a first step, we formally show that the model selection problem (MSP) is computationally intractable. Next, we propose LLMS8 ELECTOR, a principled framework that learns LLMs’ strengths and weaknesses across different modules through an LLM evaluator and then performs an efficient optimization to select which models to use in any given compound system with a bounded number of modules. Our theoretical analysis gives mathematical conditions under which LLMSELECTOR only requires LLM calls scaling linearly with the number of modules and the number of LLMs to identify the optimal model selec14 tion. Extensive experiments across diverse tasks, including question answering, constrained text generation, and code execution, demonstrate that LLMSELECTOR confers 4%-73% accuracy gains for compound AI systems like Self-Refine and Multiagent-Debate with general-purpose models (e.g., GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro), and 3%-21% gains with frontier reasoning models (e.g., o3-mini, 19 Claude 3.7 Sonnet, Gemini 2.0 Flash).
Submission Number: 36
Loading