Abstract: Machine Learning-based Actionable Warning Identification (ML-based AWI) has attracted a great deal of research work in recent years. A reliable warning dataset is critical to evaluate the effectiveness of ML-based AWI approaches. Unfortunately, the warning dataset used in the ML-based AWI community are automatically labeled by the closed-based warning heuristic, which is prone to mislabeling. Such mislabeled warning datasets could lead to the inaccurate evaluation of ML-based AWI approaches. Removing the label noises from the warning dataset is a very challenging task. To address this problem, we propose an effective approach to reduce label errors in AWI. Specifically, we creatively introduce Confident Learning (CL), a state-of-the-art noise estimation technique, to the ML-based AWI community. To mitigate false positives introduced by CL, we incorporate the idea of ensemble learning to achieve a more robust capability in noise detection. Based on a reliable warning dataset from 10 real-world and large-scale projects, the experimental results show that our approach can be effective in detecting label noises.
Loading