Effective Vulnerability Detection over Code Token Graph: A GCN with Score Gate Based Approach

Published: 2024, Last Modified: 15 Jan 2026APSEC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In modern society, software systems are integral to various aspects of life. Finding an efficient vulnerability identification approach is crucial for ensuring security and preventing malicious attacks. In recent years, many deep learning-based methods have shown outstanding performance in the vulnerability detection task. However, these methods still have limitations. Some methods consider code input as token sequences and apply architectures typically used in natural language processing. They fail to utilize the structural information from various code components' interactions, which limits these models' performance. Other methods based on graph neural networks, although better at learning structural information, treat each node equally and fail to emphasize key elements. To overcome these limitations, we propose CTGGSG, an effective vulnerability detection approach over Code Token Graph (CTG) based on GCN with score gate. In our model, we use PLE-CG-SE module to represent the source code samples as the CTGs, effectively utilizing the high-quality feature representation of PL-PLM (Pre-trained Language Model based on Program Languages) and retaining structural information from the source code. During the graph learning process, we combine GCN convolution and score gate mechanism to make the model focus more on the key nodes within the graph and increase the receptive field of the nodes. To comprehensively evaluate the performance and scalability of our model, we conducted experiments on two real-world datasets: CodeX Glue,which contains balanced sample labels, and Re-veal, which contains imbalanced sample labels. These datasets contain 27,318 and 22,734 function-level samples, respectively, derived from large-scale, popular real-world projects. Compared to existing advanced vulnerability detection methods, our model achieved state-of-the-art performance overall.
Loading