Identifying Root Cause of Bugs by Capturing Changed Code Lines With Relational Graph Neural Networks
Abstract: The Just-In-Time defect prediction model helps development teams improve software quality and efficiency by assessing whether code changes submitted by developers are likely to introduce defects in real-time, allowing timely identification of potential issues during the commit stage. In the meanwhile, identifying and labeling whether code change commits introduce defects are crucial for training the JIT model. However, two main challenges exist in current work due to the reality that all deleted and added lines in bug-fixing commits may be related to the root cause of the introduced bug: 1) lack of effective integration of heterogeneous graph information, and 2) lack of semantic relationships between changed code lines. To address these challenges, we propose a method called JIT-Finder, which utilizes relational graph convolutional network to capture the semantic relationships between changed code lines. JIT-Finder is used to detect root-cause deletion lines in changed code lines, thereby identifying the root cause of introduced bugs in bug-fixing commits. Specifically, the JIT-Finder consists of three components: the graph construction component, the graph type conversion component, and the root cause detection component. The graph construction component analyzes the source code of bugfixing commits to construct a heterogeneous graph representation by extracting added/deleted nodes based on added/deleted lines and extracting edges according to the relationships between the nodes. Next, to address the challenge of varying feature dimensions and the difficulty of integrating information in the heterogeneous graph of changed code lines, the graph type conversion component merges different types of nodes/edges into a unified set of nodes/edges and the type information of each node/edge is encoded as an additional vector. This process unifies the heterogeneous graph data into homogeneous graph data while preserving the type characteristics of different nodes and edges, thereby facilitating the integration of information from various nodes and edges. Finally, the root cause detection module uses a node embedding layer to obtain embedding vectors for the corresponding code statements, followed by a relational graph convolutional layer to capture the semantic relationships between the changed code lines and generate prediction labels. Ultimately, the root cause deletion lines in the bug-fixing commit are identified through a ranking layer applied to the deleted nodes. To evaluate the effectiveness of JIT-Finder, we used three datasets that contain high-quality bug-fixing and bug-introducing commits. Extensive experiments were conducted to evaluate the performance of our model by collecting data from 87 open-source projects, including 675 bug-fix commits. The experimental results show that, compared to the most advanced root cause detection methods, JIT-Finder improved Recall@1, Recall@2, Recall@3, and MFR by at 4.11%, 5.11%, 4.29%, and 14.41%, respectively.
External IDs:doi:10.1109/tce.2025.3614479
Loading