CodeDiff: A Malware Vulnerability Detection Tool Based on Binary File Similarity for Edge Computing Platform

Published: 01 Jan 2022, Last Modified: 24 Jul 2025WASA (3) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Malware detection has become a hot research pot as the development of Internet of Things and edge computing have grown in popularity. Specifically, various malware exploits firmware vulnerabilities on hardware platform, resulting in significant financial losses for both IoT users and edge platform providers. In this paper, we propose CodeDiff, a fresh approach for malware vulnerability detection on IoT and edge computing platforms based on the binary file similarity detection. CodeDiff is an unsupervised learning method that employs both semantic and structural information for binary diffing and does not require label data. Through the SkipGram with Negative Sampling, we generate the word vocabulary for instruction data. The Graph AutoEncoder is then used to embed both the semantic and structure information into the representation matrix for the CFG. After this, we employ the Improved Graph AutoEncoder to fuse all the function structures, function characteristics and function features to the fusion matrix. Finally, we propose the specific matrix comparison to achieve the high accuracy similarity results in short amount of time. Furthermore, we test the prototype on binary datasets OpenSSL and Curl. The results reveal that CodeDiff gives high performance on the binary file similarity detection, which contributes to identify malware vulnerability and improves the security of Internet of Things platforms.
Loading