Identifying Meaningful Vulnerability Report in Common Weakness Enumeration

Guanxi Li; Wen Zhao; Lixiao Zhao; Hui Li; Shikai Guo; Li-Ying Hao

Identifying Meaningful Vulnerability Report in Common Weakness Enumeration

Guanxi Li, Wen Zhao, Lixiao Zhao, Hui Li, Shikai Guo, Li-Ying Hao

14 Aug 2024 (modified: 21 Aug 2024)IEEE ICIST 2024 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract:

The vulnerabilities in Open Source Software (OSS) code, particularly critical ones, offer attackers numerous oppor- tunities, leading to significant economic losses for users. This has driven the development of various models to identify these vulnerabilities. However, previous models often used shallow neural networks with a single feature extraction method, failing to capture deep feature representations. To address this, we propose Vuln-Detector, an approach for automatically identifying dangerous Issue Reports (IRs). Vuln-Detector comprises three components: the Knowledge Bank component, which stores information about Common Weakness Enumeration (CWE) to enhance learning; the Matching component, which measures the similarity between a security report and CWE categories; and the Voting component, which determines whether a report is related to a code vulnerability. We validated our approach through experiments on 3,937 No Security Vulnerability Reports (NSVRS) from 1,390 OSS projects on GitHub. Vuln-Detector achieved a precision of 42%, recall of 73%, F1-score of 53%, AUROC of 98%, and AUPRC of 41%. Compared to the current state-of-the-art, it shows a relative improvement of 11% in precision, 5% in AUPRC, and 8% in F1-score. The results demonstrate that Vuln-Detector effectively identifies vulnerability-related IRs.

Submission Number: 119

Loading