Abstract: Aiming at the problems that the sample-based category imbalance methods were prone to the loss of important data and the current software defect prediction methods did not identify defect potential risks based on the dichotomous classification results, a defect risk identification method on the premise of the sample category perception was proposed. Firstly, noise samples, feature distributions, high-dimensional features, and category imbalance of multi-source sample sets were processed to obtain cleaned samples. Then, the category perception and perception coefficients of the cleaned samples were calculated to obtain the category perception product, and the defect risk identification (DRI) model was constructed by the category perception product. Finally, the risk identification probabilities were calculated using the DRI model to identify potential risks. The experimental results show that the proposed method performs well with accuracy, F1-score, and Matthews correlation coefficient. The obtained risk identification probabilities and defect risk levels are consistent with the actual situation of real samples, which can accurately reflect the severity of defects and identify potential risks.
0 Replies
Loading