AdvBinSD: Poisoning the Binary Code Similarity Detector via Isolated Instruction Sequences

Xiaoyu Yi, Gaolei Li, Ao Ding, Yan Zheng, Yuqing Li, Jianhua Li

Published: 01 Jan 2023, Last Modified: 06 Dec 2024ISPA/BDCloud/SocialCom/SustainCom 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Binary code similarity detection (BinSD) systems trend to utilize deep learning to identify semantic features of assembly code and exhibits superior performance, gaining increasing popularity against traditional methods. However, it has been observed that existing deep learning models are susceptible to data poisoning attacks, posing a latent threat to the robustness and reliability of BinSD. Existing data poisoning strategies in BinSD are easily detectable for the generated triggers will destroy code functions. Moreover, selecting trigger injection location needs repeated exploration and verification, increasing the attack cost. To address this issue, we propose a novel adversarial scheme, named as AdvBinSD, which can poison the deep learning-based binary code similarity detector and make it sensitive to isolated instruction sequences. In AdvBinSD, the isolated instruction sequences generally refer to those instructions that have no data dependencies with other instructions and do not affect the function of original binary code, and also it is difficult to discovery those isolated instruction sequences by verifying syntactic validity and semantic integrity. Different from existing data poisoning strategies, AdvBinSD first estimates a code fragment that has the greatest impact on software functionality as the poisoning location, and then add isolated instruction sequences into this location to synthesize effective poisoned samples. This location estimation is achieved by maximizing the similarity between function-level feature vectors and instruction-level feature vectors, ensuring that the modified assembly code can execute correctly. Furthermore, to improve the efficiency of feature vector similarity computing process, a k-order greedy feature comparison (k-GFC) algorithm is also designated. Extensive experiments demonstrate that our proposed AdvBinSD can successfully poison the state-of-the-art deep learning-based binary code similarity detectors.