Accurate Compiler, Optimization, and Architecture Independent Function Identification using Program State TransformationsDownload PDF

Anonymous

03 Feb 2022 (modified: 05 May 2023)Submitted to JSYS Feb 22Readers: Everyone
Keywords: Binary Analysis, Binary Similarity
TL;DR: Program state transformations can more accurately identify function semantics, including when code is differently optimized or obfuscated.
Abstract: Patching vulnerabilities in third party libraries is critical for maintaining security, yet such patches can take over 500 days to be distributed on average. Manually creating binary patches requires semantic analysis to identify the full set of functions present in the library. Existing semantic binary analysis approaches do not scale, or are inaccurate. In this paper, we introduce IOVec Function Identification (IOVFI), which assesses similarity based on program state transformations, which compilers largely guarantee even across compilation environments and architectures. IOVFI executes functions with initial predetermined program states, measures the resulting program state changes, and uses the sets of input and output state vectors as unique semantic fingerprints. Since IOVFI relies on state vectors, and not code measurements, it withstands broad changes in compiler, optimization, underlying architecture, and even different implementations of equivalent functionality. Crucially, IOVFI is the first approach to support architecture independent classification. Evaluating our IOVFI implementation as a semantic function identifier for coreutils-8.32, we achieve a high .779 average F-Score, indicating high precision and recall. When identifying functions generated from differing compilation environments, IOVFI achieves a 101% accuracy improvement over the most-recent BinDiff 6, outperforms asm2vec in cross- compilation environment accuracy, and, when compared to dynamic frameworks, BLEX and IMF-SIM, IOVFI is 25%–53% more accurate. Additionally, we show that IOVFI is largely unaffected by code obfuscation by achieving similarly high accuracy against obfuscated code. To demonstrate that state transformations are capable of cross-ISA identification, IOVFI achieves similarly high accuracy rates when identifying AArch64 functions using unmodified x64 classification vectors. We show that IOVFI scales to large binaries by evaluating semantic identification accuracy for three large and commonly used libraries: libxml2, libpng, and libz. Finally, we perform a semantic history analysis of libpng and libz on 14 different versions. We correctly identify libpng versions distributed with the last five years of Ubuntu releases.
Area: Computer Architecture
Type: Tool/benchmark
4 Replies

Loading