Reducing Label Errors for Actionable Warning Identification

Haoli Chen, Xiaocheng Chen, Xiaolei Sun, Qirui Zheng, Xiuting Ge

Published: 2023, Last Modified: 15 May 2025QRS Companion 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Machine Learning-based Actionable Warning Identification (ML-based AWI) has attracted a great deal of research work in recent years. A reliable warning dataset is critical to evaluate the effectiveness of ML-based AWI approaches. Unfortunately, the warning dataset used in the ML-based AWI community are automatically labeled by the closed-based warning heuristic, which is prone to mislabeling. Such mislabeled warning datasets could lead to the inaccurate evaluation of ML-based AWI approaches. Removing the label noises from the warning dataset is a very challenging task. To address this problem, we propose an effective approach to reduce label errors in AWI. Specifically, we creatively introduce Confident Learning (CL), a state-of-the-art noise estimation technique, to the ML-based AWI community. To mitigate false positives introduced by CL, we incorporate the idea of ensemble learning to achieve a more robust capability in noise detection. Based on a reliable warning dataset from 10 real-world and large-scale projects, the experimental results show that our approach can be effective in detecting label noises.