Asm2SrcEval: Evaluating Large Language Models for Assembly to Source Code Translation

Parisa Hamedi; Hamed Jelodar; Samita Bai; Mohammad Meymani; Roozbeh Razavi-Far; Ali A. Ghorbani

Asm2SrcEval: Evaluating Large Language Models for Assembly to Source Code Translation

Parisa Hamedi, Hamed Jelodar, Samita Bai, Mohammad Meymani, Roozbeh Razavi-Far, Ali A. Ghorbani

Published: 22 Sept 2025, Last Modified: 25 Nov 2025DL4C @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models (LLMs), Code Translation, Assembly, Code Analysis, Decompilation

Abstract: Assembly-to-source code translation is a critical task in reverse engineering, cybersecurity, and software maintenance, yet systematic benchmarks for evaluating large language models (LLMs) on this problem remain scarce. In this work, we present the first comprehensive evaluation of five state-of-the-art LLMs on assembly-to-source translation. We assess model performance using a diverse set of metrics capturing lexical similarity (BLEU, ROUGE, METEOR), semantic alignment (BERTScore), fluency (Perplexity), and efficiency (time prediction). Our results reveal clear trade-offs: while certain models excel in text similarity metrics, others demonstrate lower perplexity or faster inference times. We further provide qualitative analyses of typical model successes and failure cases, highlighting challenges such as control flow recovery and identifier reconstruction. Taken together, our benchmark offers actionable insights into the strengths and limitations of current LLMs for program translation, establishing a foundation for future research in combining accuracy with efficiency for real-world applications.

Submission Number: 69

Loading