iSafeRM: An Interpretable and Safety-Aware Resource Management Framework for Microservice Systems With SLO Guarantees
Abstract: Due to the dynamic workloads and complex interactions among different microservices, it is difficult for web service providers to elastically allocate cloud resources to each interior microservice to minimize the total resource usage within the service-level objective (SLO) constraint. Since proactive resource management relies on historical data and complex machine learning models to forecast future workload patterns and resource demands, recent efforts have shifted towards reactive resource management. However, previous studies still suffer from serious challenges including: (1) accurately locating and explaining performance bottlenecks, (2) ensuring SLO compliance during online configuration exploration, (3) identifying and configuring the performance-critical parameters inherent in bottleneck microservices. To overcome these challenges, this paper introduces iSafeRM, an interpretable and safety-aware resource management framework tailored for microservice-based systems. Upon detecting an SLO violation, iSafeRM automatically locates the bottleneck microservices and performs a safety-aware online configuration process of both computing resources and performance-critical parameters. Experimental results conducted on our laboratory Docker cluster show that compared with four representative baselines, iSafeRM can respectively reduce resource usage by an average of 7.5% to 19.1%, 9.3% to 23.0% and 18.7% to 31.0% across three different applications, while maintaining the lowest probability of SLO violations. We also demonstrate iSafeRM’s effectiveness under dynamic workloads.
External IDs:dblp:journals/tsc/DouLFZZ25
Loading