Abstract: Software defect prediction (SDP) is a critical task that aims to identify potential defects and allocate resources for testing to enhance software reliability. In this study, we present a novel defect prediction framework called EDP-BGCNN, which leverages the power of BERT and graph convolutional neural networks to represent code. Our approach first extracts the code’s structural semantic features based on its abstract syntax tree (AST), followed by applying BERT for embedded learning to extract the code’s semantic features. We then use latent Dirichlet allocation (LDA) to extract descriptive semantic features and convert them into a numeric vector. The code and descriptive semantic features are then combined and processed by GraphSMOTE to address the class imbalance problem. Finally, we obtain a more comprehensive representation using graph convolutional neural networks. We evaluated our approach on five open-source projects and compared it with three state-of-the-art deep-learning methods. Our experimental results demonstrate that EDP-BGCNN can achieve significant improvements in AUC (4.9% - 23%) and F1-measure (6.6% - 10.7%) on average.
Loading