GNPA-DIL: Unveiling the Vulnerability Genome Through Semantic Graph Distillation and Invariant Neural Reasoning
Keywords: code security, transfer learning, reinforcement learning
Abstract: Software vulnerabilities constitute an escalating security crisis with over 25,000 new CVEs documented annually, demanding detection models capable of identifying complex vulnerability patterns across evolving codebases. Contemporary vulnerability detection models exhibit catastrophic brittleness when deployed beyond controlled benchmarks, failing to maintain accuracy on rigorously-validated samples and collapsing entirely when confronted with routine syntactic variations or cross-function vulnerability patterns. The GNPA-DIL model overcomes these limitations through a neural architecture trained on vulnerability-centric program slices extracted via Code Property Graphs, learning domain-invariant representations that capture fundamental vulnerability semantics rather than superficial code patterns. By learning to process dramatically compressed program representations, the GNPA-DIL model transcends the context limitations plaguing existing architectures while preserving the critical information flows that characterize actual vulnerabilities. This fundamental advance in vulnerability representation learning enables the model to generalize beyond its training distribution, detecting previously unseen vulnerability types with 63.48\% accuracy on Emerging-Post-Vulnerability CVEs. On the SVEN benchmark, GNPA-DIL achieves 73.58\% F1-score compared to the best baseline's 54\%, representing a 36\% relative improvement, while maintaining 67.63\% accuracy on cross-function vulnerabilities despite being trained only on function-level data.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 21755
Loading