Semi-Supervised Metrics-Based Self-Training Root Cause Analysis for Cloud-Native Systems with Class-Imbalanced Data

Ying Huang, Qingfeng Du, Yongqi Han, Cheng He, Fulong Tian

Published: 01 Jan 2024, Last Modified: 06 Feb 2025ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Root cause analysis is crucial for cloud-native systems. However, existing supervised approaches ignore the potential of unlabeled data, which is frequent in the cloud-native root cause analysis scenarios. Moreover, the class-imbalanced distribution of faults presents obstacles to applying semi-supervised learning. To overcome these limitations, we propose STRCA, a metrics-based semi-supervised self-training approach for root cause analysis. Furthermore, STRCA employs minority priority self-training, which selects pseudo-labels of high quality during generations. Additionally, the stepwise distribution alignment is introduced to rebalance the predicted distribution with gradually decreasing strength. These two strategies mitigate the class-imbalance of data in semi-supervised learning. Experiments on the public dataset show the effectiveness of STRCA with limited labels.