Advancing Root Cause Analysis in Cloud-native System with Knowledge Graph Path Embedding Translation
Abstract: Cloud computing technologies, including cloud-native and containerization, have gained prominence in recent years, attributed to their exceptional scalability, enhanced resource utilization, and expedited deployment capabilities. However, their inherent complexity and the intricate interplay of internal components heighten the risk of sporadic and unforeseen anomalies. To address these challenges, Root Cause Analysis (RCA) is employed to accurately identify problematic services (pods) and mine the precise faults behind observed anomalies. Tailored to the limitations of conventional RCA algorithms, we propose a novel approach that jointly models operation entities and their relationships as learnable embeddings. Additionally, this method integrates fault propagation information to further improve RCA accuracy. Our evaluation involves developing a prototype within the Kubernetes cloud-native system. Extensive experimental results validate the efficacy of our approach.
Loading