FastPERT: towards fast microservice application latency prediction via structural inductive bias over PERT networks

Da Sun Handason Tam, Huanle Xu, Yang Liu, Siyue Xie, Wing Cheong Lau

Published: 25 Feb 2025, Last Modified: 06 May 2026AAAI 2025EveryonearXiv.org perpetual, non-exclusive license

Abstract: The recent surge in popularity of cloud-native applications using microservice architectures has led to a focus on accurate end-to-end latency prediction for proactive resource allocation. Existing models leverage Graph Transformers to Microservice Call Graphs or the Program Evaluation and Review Technique (PERT) graphs to capture complex temporal dependencies between microservices. However, these models incur a high computational cost during both training and inference phases. This paper introduces FastPERT, an efficient model for predicting end-to-end latency in microservice applications. FastPERT dissects an execution trace into several microservices tasks, using observations from prior execution traces of the application, akin to the PERT approach. Subsequently, a prediction model is constructed to estimate the completion time for each individual task. This information, coupled with the computational and structural inductive bias of the PERT graph, facilitates the efficient computation of the end-to-end latency of an execution trace. As a result, Fast- PERT can efficiently capture the complex temporal causality of different microservice tasks, leading to more accurate and robust latency predictions across a variety of applications. An evaluation based on datasets generated from large-scale Alibaba microservice traces reveals that FastPERT significantly improves training and inference efficiency without compromising performance, demonstrating its potential as a superior solution for real-time end-to-end latency prediction in cloud-native microservice applications.