Unsupervised Software Defect Prediction Through Multiview Clustering

Published: 2025, Last Modified: 26 Jan 2026IEEE Trans. Reliab. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The core goal of software defect prediction (SDP) is to identify modules with a high likelihood of defects, thereby enabling prioritization of quality assurance activities with low inspection effort. There are many supervised defect prediction models that are extensively studied. However, these methods require the need for labeling data to get enough training modules, which will cause a lot of waste of human resources. Cross-project defect prediction primarily reuses models trained on other projects with enough historical data. However, this strategy is often hindered by large distribution differences across different projects and privacy concerns of data. Unsupervised learning technique is an alternative solution to the unlabeled data, but it mainly focuses on single-view prediction by concatenating all the software metrics. This ignores the diversity and complementarity of different types of metrics. This study proposes a novel approach, namely, multiview unsupervised software defect prediction (MUSDP). It aims to collaboratively learn the diversity and complementarity of different views to build a robust and reliable defect prediction model. Extensive experiments on $ 28$ releases from eight software projects indicate that MUSDP exhibits superior or comparable results regarding G-mean, AUC, $P_{\text{opt}}$, and Recall@20% compared to competing supervised and unsupervised methods. For the interpretation of MUSDP, the number of added and deleted lines significantly influence its predictions.
Loading