𝜆Grapher: A Resource-Efficient Serverless System for GNN Serving through Graph Sharing

Published: 23 Jan 2024, Last Modified: 23 May 2024TheWebConf24 OralEveryoneRevisionsBibTeX
Keywords: Serverless Computing, Graph Neural Networks, Model Serving
Abstract: Graph Neural Networks (GNNs) have been increasingly adopted for graph analysis in web applications such as social networks. Yet, efficient GNN serving remains a critical challenge due to high workload fluctuations and intricate GNN operations. Serverless computing, thanks to its flexibility and agility, offers on-demand serving of GNN inference requests. Alas, the request-centric serverless model is still too coarse-grained to avoid resource waste. Observing the significant data locality in computation graphs of requests, we propose 𝜆Grapher, a serverless system for GNN serving that achieves resource efficiency through graph sharing and fine-grained resource allocation. 𝜆Grapher features the following designs: (1) adaptive timeout for request buffering to balance resource efficiency and inference latency, (2) graph-centric scheduling to minimize computation and memory redundancy, and (3) resource-centric function management with fine-grained resource allocation catered to the resource sensitivities of GNN operations and function orchestration optimized to hide communication latency. We implement a prototype of 𝜆Grapher based on the representative open-source serverless platform Knative and evaluate it with real-world traces from various web applications. Our results show that 𝜆Grapher can achieve savings of up to 54.2% in memory resource and 45.3% in computing resource compared with the state-of-the-art while ensuring GNN inference latency.
Track: Systems and Infrastructure for Web, Mobile, and WoT
Submission Guidelines Scope: Yes
Submission Guidelines Blind: Yes
Submission Guidelines Format: Yes
Submission Guidelines Limit: Yes
Submission Guidelines Authorship: Yes
Student Author: Yes
Submission Number: 414
Loading