Poster: PipeLLM: Pipeline LLM Inference on Heterogeneous Devices with Sequence Slicing

Ruilong Ma, Jingyu Wang, Qi Qi, Xiang Yang, Haifeng Sun, Zirui Zhuang, Jianxin Liao

Published: 01 Jan 2023, Last Modified: 06 Nov 2023SIGCOMM 2023Readers: Everyone

Abstract: Large Language Models (LLMs) has fostered the creation of innovative requirements. Locally deployed LLMs for micro-enterprise mitigates potential issues such as privacy infringements and sluggish response. However, they are hampered by the limitations in computing capability and memory space of possessed devices. We introduce PipeLLM, which allocates the model across devices commensurate with their computing capabilities. It enables the parallel execution of layers with slicing input sequence along the token dimension. PipeLLM demonstrates the potential to accelerate LLM inference with heterogeneity devices, offering a solution for LLM deployment in micro-enterprise hardware environment.

0 Replies