PMatch: Semantic-based Patch Detection for Binary Programs

Published: 01 Jan 2021, Last Modified: 15 May 2025IPCCC 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Binary function matching has been proposed to detect the known vulnerabilities. However, the high similarity between the vulnerable and patched versions leads to a large of false positives. Patch detection is proposed to improve the accuracy of function matching by identifying the patched functions from matching results. However, the accuracy of existing methods decreases significantly due to the function changes introduced by high compiler optimization levels.In this paper, we propose PMatch, a method based on code semantic similarity to detect the patched binary functions. Firstly, PMatch extracts patch-affected code snippets from the patched binary function. Secondly, PMatch leverages a novel unsupervised sentence embedding technique in Natural Language Processing (NLP) to generate the semantic representations of binary code. Finally, PMatch matches the patch-affected code snippets with target blocks obtained by function diffing. To evaluate PMatch, we collect 101 CVEs and compile 304 binary programs with 4 different optimization levels. PMatch achieves an 86.43% average accuracy in detecting the patched functions, which outperforms the state-of-the-art work, and costs only 65.14ms per function. Besides, at the O3 high optimization level, PMatch achieves an accuracy improvement of over 20%.
Loading