LLMs as Reverse Engineers? Not Yet on Types and Names

ICLR 2026 Conference Submission18193 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reverse Engineering, Name Recovery, Type Inference, LLM Benchmarking
TL;DR: Systematically and extensively benchmarking LLMs for reverse engineering,especially in name recovery and type inference
Abstract: Large Language Models (LLMs) have shown promising potential in reverse engineering tasks such as function name recovery, owing to their ability to generate meaningful identifiers under input conditions. However, existing studies primarily emphasize fine-tuning LLMs for particular applications, often without providing a clear rationale for selecting a given model. To address this gap, we systematically evaluate and quantify the performance of widely used open-source mid-sized LLMs, including CodeLlama, Llama 2, and DeepSeek-R1, on two core reverse engineering tasks: name recovery and type inference. Our experimental results reveal that, without fine-tuning, none of these models achieves a high F1 score in either task. These findings enhance our understanding of the practical utility of LLMs in binary analysis and highlight critical avenues for improving their effectiveness in reverse engineering and related domains.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 18193
Loading