Revisiting Service Level Objectives and System Level Metrics in Large Language Model Serving

ACL ARR 2025 February Submission5805 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) have achieved remarkable performance and are widely deployed in various applications, while the serving of LLM inference has raised concerns about maintaining high user experience and achieving sufficient throughput. Balancing these factors is crucial for reducing operational costs while ensuring optimal performance. Accordingly, service level objectives (SLOs) and system level metrics have been introduced as key performance measures for LLM serving. However, current metrics fall short in accurately capturing user experience. We find two notable issues: 1) manually delaying the delivery of some tokens can improve metrics of requests, and 2) actively abandoning requests that do not meet SLOs can improve system level metrics. In this paper, we revisit SLOs and system level metrics in LLM serving and propose a comprehensive metric framework called smooth goodput, which integrates SLOs and system level metrics to reflect the nature of user experience in LLM serving. It is designed to be adaptable, with parameters that can be tailored to the specific objectives of various tasks. Through this unified framework, we reassess the performance of different LLM serving systems under multiple workloads. We aspire for this framework to establish a standardized method for evaluating LLM serving, thereby encouraging cohesive advancements in future research.
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: evaluation and metrics, applications
Contribution Types: NLP engineering experiment, Theory
Languages Studied: English
Submission Number: 5805
Loading