CodeGuard: Structural Code Analysis with Graph Neural Networks for Memory Safety Vulnerability Detection in C/C++
Keywords: Vulnerability Detection, Graph Neural Networks, Static Analysis, Source Code Representation, Software Security, Curriculum Learning
Abstract: Memory safety vulnerabilities in C and C++ remain a critical systemic risk. Traditional static analysis often suffers from high false positive rates, while state-of-the-art machine learning models typically rely on compiler-generated Intermediate Representations (IR), failing completely when analyzing non-compilable code fragments. We present CodeGuard, a vulnerability detection framework that leverages heterogeneous Message Passing Neural Networks (MPNNs) directly on source code. By constructing structural graphs that integrate Abstract Syntax Trees (ASTs), Control Flow Graphs (CFGs), and Data Flow, CodeGuard captures complex syntactic dependencies without requiring a build environment. Extensive evaluation on three real-world benchmarks (Big-Vul, Devign, MegaVul), demonstrates that CodeGuard achieves state-of-the-art performance, yielding an F1 score of 95.2\% on Big-Vul and 91.8\% on the massive MegaVul dataset. This approach eliminates the build-chain requirement while outperforming compilation-dependent baselines in both precision and recall.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Code generation and understanding, Security/Privacy
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: C. C++
Submission Number: 3063
Loading