Improving effort-aware defect prediction by directly learning to rank software modules

Xiao Yu, Jiqing Rao, Lei Liu, Guancheng Lin, Wenhua Hu, Jacky Wai Keung, Junwei Zhou, Jianwen Xiang

Published: 01 Jan 2024, Last Modified: 13 Nov 2024Inf. Softw. Technol. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Context:Effort-Aware Defect Prediction (EADP) ranks software modules according to the defect density of software modules, which allows testers to find more bugs while reviewing a certain amount of Lines Of Code (LOC). Most existing methods regard the EADP task as a regression or classification problem. Optimizing the regression loss or classification accuracy might result in poor effort-aware performance.Objective:Therefore, we propose a method called EALTR to improve the EADP performance by directly maximizing the Proportion of the found Bugs (PofB@20%) value when inspecting the top 20% LOC.Method:EALTR uses the linear regression model to build the EADP model, and then employs the composite differential evolution algorithm to generate a set of coefficient vectors for the linear regression model. Finally, EALTR selects the coefficient vector that achieves the highest PofB@20% value on the training dataset to construct the EADP model. To further reduce the Initial False Alarms (IFA) value of EALTR, we propose a re-ranking strategy in the prediction phase.Results:Our experimental results on eleven project datasets with 41 releases show that EALTR can find 5.83%–54.47% more bugs than the baseline methods whose IFA values are less than 10 and the re-ranking strategy significantly reduces the IFA value by 16.95%.Conclusion:Our study verifies the effectiveness of directly optimizing the effort-aware metric (i.e., PofB@20%) to build the EADP model. EALTR is recommended as an effective EADP method, since it can help software testers find more bugs.