Automatic Software Vulnerability Detection in Binary Code

Shigang Liu, Lin Li, Xinbo Ban, Chao Chen, Jun Zhang, Seyit Camtepe, Yang Xiang

Published: 2024, Last Modified: 27 Feb 2026ML4CS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Cybersecurity is critical in today’s digital world, where the severity of threats from software vulnerabilities grows significantly each year. Many techniques have been developed to analyze vulnerabilities in source code. However, source code is not always available (for example, most industry software is closed-source). As a result, analyzing vulnerabilities in binary code becomes necessary and more challenging. This paper presents a novel approach called BiVulD for detecting vulnerabilities at the binary level. BiVulD has three phases: generating assembly language instructions, learning good embeddings, and building a prediction model. First, we create a database of vulnerable binaries using CVE and NVD. Next, we propose using codeBERT to obtain good embeddings. Finally, we apply a bidirectional LSTM on top of codeBERT to build the predictive model. To demonstrate BiVulD’s effectiveness, we compared it with several baselines, including source code-based, binary code-based, and machine learning-based techniques on real-world projects. The experimental results show that BiVulD outperforms the baselines and can detect more vulnerabilities. For instance, BiVulD achieves at least 20% improvement in Precision, Recall, and F-measure. We believe this work will serve as a foundation for future research in vulnerability detection using only binary code.

External IDs:dblp:conf/ml4cs/LiuLBCZCX24