Semi-Supervised Metrics-Based Self-Training Root Cause Analysis for Cloud-Native Systems with Class-Imbalanced Data
Abstract: Root cause analysis is crucial for cloud-native systems. However, existing supervised approaches ignore the potential of unlabeled data, which is frequent in the cloud-native root cause analysis scenarios. Moreover, the class-imbalanced distribution of faults presents obstacles to applying semi-supervised learning. To overcome these limitations, we propose STRCA, a metrics-based semi-supervised self-training approach for root cause analysis. Furthermore, STRCA employs minority priority self-training, which selects pseudo-labels of high quality during generations. Additionally, the stepwise distribution alignment is introduced to rebalance the predicted distribution with gradually decreasing strength. These two strategies mitigate the class-imbalance of data in semi-supervised learning. Experiments on the public dataset show the effectiveness of STRCA with limited labels.
Loading