Enhancing Neural Decompilation with Code-aware Fine-Tuning and Inference-time Refinement

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Decompilation, Iterative Refinement, Large Language Models
Abstract: Language models (LMs) hold promise for automating binary decompilation by translating low-level assembly into high-level source code, helping code security analysis. However, current LM-based methods struggle with the complex control and data flows in real-world programs. We present TuneRDEC, a code structure-aware LM-based approach to binary decompilation. TuneRDEC combines a task-specific LM, fine-tuned on domain-specific data, with a general-purpose LM trained on generic corpora. The task-specific LM uses abstract syntax tree (AST) analysis to assign higher loss weights to key language constructs, such as loops, conditionals, and pointers, prioritizing their importance during training. At inference time, TuneRDEC employs an iterative self-refinement process guided by compiler feedback and test-driven prompts, leveraging the general-purpose LM to improve the decompilation output. We evaluate our approach on the HumanEval-Decompile dataset. The results show significant improvements in code readability, functional correctness, and robustness compared to state-of-the-art neural decompilation methods and an industry-strength decompiler.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 8956
Loading