LLMServingSim: A Simulation Infrastructure for LLM Inference Serving Systems

Published: 30 May 2024, Last Modified: 08 Jun 2024MLArchSys 2024 OralPosterEveryoneRevisionsBibTeXCC BY 4.0
Workshop Track: Architecture 2.0
Presentation: In-Person
Keywords: Large language model, System simulator, System for machine learning
Presenter Full Name: Jaehong Cho
Presenter Email: jhcho@casys.kaist.ac.kr
Abstract: Recently, there has been a large research effort in building efficient large language model (LLM) inference serving systems, including advancements in both hardware and software. Nevertheless, there is a lack of simulation infrastructure capable of accurately modeling hardware-software system behaviors without extensively extending simulation time. This paper aims to solve the limitations of existing system simulators and develop an effective simulation tool, called LLMServingSim, to support future research in LLM inference serving systems. In designing LLMServingSim, we focus on two algorithmic properties: (1) the dynamic variation in workload characteristics of LLM inference serving due to its autoregressive nature, and (2) the need for detailed memory modeling due to the large key-value (KV) cache generated during runtime inference serving. This paper describes the key challenges contributing to bridge the "real2sim" gap and presents our initial strategies to address them. It also discusses the unresolved problems that persist.
Presenter Bio: Undergraduate in School of Computing, KAIST, Undergraduate Student Intern at CASYS Lab, KAIST
Paper Checklist Guidelines: I certify that all co-authors have validated the presented results and conclusions, and have read and commit to adhering to the Paper Checklist Guidelines, Call for Papers and Publication Ethics.
Workshop Registration: Yes, at least one of the authors has registered for the workshop (Two-Day Registration at minimum).
Submission Number: 16
Loading