OpTrans: enhancing binary code similarity detection with function inlining re-optimization

Published: 01 Jan 2025, Last Modified: 19 Feb 2025Empir. Softw. Eng. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Binary code similarity detection (BCSD) is pivotal in system security including reverse engineering, vulnerability detection and software component analysis. Recent studies on BCSD have proliferated, yet they exhibit poor performance when confronting semantic alterations (e.g., function inlining) caused by compiler optimization. To tackle this challenge, we present OpTrans, an innovative framework that fuses binary code Optimization techniques with the Transformer model for BCSD. OpTrans employs an algorithm based on binary program analysis to determine which functions should be inlined, followed by binary rewriting techniques to effectuate re-optimization on binaries. This innovative method significantly reduces false positives and enhances model performance in real-world BCSD tasks. We evaluated OpTrans on the BinaryCorp datasets, and it outperformed the state-of-the-art BCSD solutions by 21.5% on average. The inline re-optimization improved all BCSD solutions by up to 32.1%. Our ablation study and vulnerability experiment demonstrate the practicality of inline re-optimization in real-world detection scenarios, showing the usefulness of our approach.
Loading