Towards Efficient Compound Large Language Model System Serving in the Wild

Yifei Zhu, Botao Zhu, Chen Chen, Xiaoyi Fan

Published: 2024, Last Modified: 25 Jan 2026IWQoS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Utilizing compound Large Language Model (LLM) systems, instead of a monolithic LLM model, is gradually becoming a practical solution to realize a diverse range of industry applications. In compound LLM systems, an LLM collaborates with other external tools, APIs, or LLMs to offer intelligent services. In this poster, we identify the unique challenges, namely temporal and topological uncertainty, brought about by compound LLM systems in system serving. We then propose a priority-based scheduling policy to schedule different stages in DAG-represented compound LLM systems. The preliminary results show promising performance of uncertainty-aware scheduling policies.