Abstract: Utilizing compound Large Language Model (LLM) systems, instead of a monolithic LLM model, is gradually becoming a practical solution to realize a diverse range of industry applications. In compound LLM systems, an LLM collaborates with other external tools, APIs, or LLMs to offer intelligent services. In this poster, we identify the unique challenges, namely temporal and topological uncertainty, brought about by compound LLM systems in system serving. We then propose a priority-based scheduling policy to schedule different stages in DAG-represented compound LLM systems. The preliminary results show promising performance of uncertainty-aware scheduling policies.
Loading