Abstract: Decompilation transforms low-level program languages (PL) (e.g., binary code) into high-level PLs (e.g., C/C++). It has been widely used when analysts perform security analysis on software (systems) whose source code is unavailable, such as vulnerability search and malware analysis. However, current decompilation tools usually need lots of experts’ efforts, even for years, to generate the rules for decompilation, which also requires long-term maintenance as the syntax of high-level PL or low-level PL changes. Also, an ideal decompiler should concisely generate high-level PL with similar functionality to the source low-level PL. In this paper, we propose a novel neural decompilation approach to translate low-level PL into accurate high-level PL. We design a transformer-based neural network model, including a data dependency-based masked self-attention scheme and an instruction embedding scheme that accurately learns the mapping rules between low-level PLs and high-level PLs. We also propose a new intermediate language representation to bridge the information asymmetry between high-level and low-level PL. Furthermore, we implement the proposed approach called ANDE. Evaluations of four real-world applications show that ANDE has an average accuracy of 94.41%, much better than prior neural machine translation (NMT) models.
External IDs:doi:10.1007/978-981-96-4731-6_16
Loading