GAMMA: Graph Neural Network-Based Multi-Bottleneck Localization for Microservices Applications

Published: 01 Jan 2024, Last Modified: 19 May 2025WWW 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Microservices architecture is quickly replacing monolithic and multi-tier architectures as the implementation choice for large-scale web applications as it allows independent development, scalability, and maintenance. However, even with careful node scheduling and scaling, the microservices applications are still vulnerable to performance degradation due to unexpected (dependent or independent) events like anomalous node behavior, workload interference, or sudden spikes in requests or retries. These events can adversely affect the performance of one or more microservices (bottlenecks), degrading the overall application performance. To ensure a good customer experience and avoid revenue loss, it is crucial to detect and mitigate all bottlenecks swiftly.This work introduces GAMMA, a novel, explainable graph learning model that integrates a mixture of experts to detect multiple bottlenecks. We evaluated GAMMA using a popular open-source benchmarking application deployed on Kubernetes under various practical bottleneck scenarios. Our experimental evaluation results show that GAMMA provides significantly better performance (46% higher F1 score) than existing works that employ deep learning, machine learning, and statistical techniques, demonstrating its ability to detect multiple bottlenecks by learning complex interactions in a microservices architecture.The dataset is made publicly available [49] for reproducibility and further research in the field.
Loading