A Semantics-Based Approach on Binary Function Similarity Detection

Yuntao Zhang, Binxing Fang, Zehui Xiong, Yanhao Wang, Yuwei Liu, Chao Zheng, Qinnan Zhang

Published: 01 Aug 2024, Last Modified: 07 Jan 2026IEEE Internet of Things JournalEveryoneRevisionsCC BY-SA 4.0
Abstract: As a fundamental component of Internet of Things (IoT) devices, firmware plays an essential role. Nowadays, the development of IoT firmware relies extensively on third-party components and substantially enhances development efficiency. However, these components are not inherently secure, and their vulnerabilities can adversely affect the security of IoT firmware. Existing research adopts binary code similarity analysis to detect known vulnerabilities in firmware. However, it encounters significant challenges, primarily in extracting function features from the limited semantic information within binary code. Another challenge is the need for real-world data sets to assess the model’s performance in practical scenarios, such as firmware supply chain analysis. We present a detection model named program dependence graph to vector (PDG2Vec) based on program dependence graphs (PDGs) to tackle these challenges. PDG2Vec extracts function features at the variable level on PDG and assesses function similarity by evaluating whether two functions can represent each other. We conducted evaluations using three data sets, including one we created to simulate a firmware supply chain scenario. The experimental results demonstrate that PDG2Vec exhibits resilience to cross-architecture challenges and captures more precise semantics than other approaches. Furthermore, PDG2Vec outperforms state-of-the-art tools in the supply chain analysis scenario, with a 16% higher area under the curve (AUC) value average against baseline approaches.
Loading