Abstract: While deep learning-based models have achieved remarkable progress in vulnerability detection, our understanding of these models remains limited, which hinders further advancement in model capability, mechanistic understanding of detection processes, and efficient and safe practical deployment. This paper presents a comprehensive investigation of state-of-the-art learning-based models, including sequence-based models, graph-based models, and Large Language Models (LLMs), through extensive experiments conducted on MegaVul, a recently constructed large-scale vulnerability dataset. We systematically explore seven research questions across five critical dimensions: model capability, model interpretation, model robustness, ease of model deployment, and model economy. Our experimental findings reveal the superiority of sequence-based models over graph-based models and demonstrate the limited effectiveness of current LLMs (e.g., ChatGPT and CodeLlama) for vulnerability detection. We identify the specific vulnerability types that different learning-based models excel at detecting and reveal the instability of the models through subtle semantic equivalent changes in the input. Through interpretability analysis, we provide empirical insights into what these models actually learn and focus on during the detection process. Additionally, we systematically summarize the pre-processing requirements and deployment considerations necessary for practical model usage. Finally, our study provides essential guidelines for the economical and safe practical application of learning-based models, offering valuable insights for both researchers and practitioners.
External IDs:dblp:journals/ese/NiYSW26
Loading