Keywords: Smart Contract, Decompilation
Abstract: Smart contracts are programs deployed on blockchains that manage digital assets and enable decentralized applications. While their bytecode is always accessible on-chain, more than 99% of Ethereum contracts lack verified source code, making decompilation essential for transparency and security analysis.
Traditional decompilers rely on program analysis to produce structured but low-level representations. Recent advances in large language models (LLMs) enable source-like output with higher readability and even recompilability. Yet systematic evaluation is missing: existing tools use narrow datasets and inconsistent metrics, hindering fair comparison and reproducibility.
We present the first systematic benchmark for smart contract decompilation. Our contributions are: (i) a diverse dataset of real-world contracts, filtered for redundancy and stratified by difficulty; (ii) a staged evaluation framework with metrics for format completeness, compilability, Application Binary Interface (ABI) recovery accuracy, and semantic equivalence; and (iii) baseline evaluations using a fine-tuned reference model, establishing a strong foundation for future research.
Our benchmark establishes a common ground for rigorous, reproducible evaluation and aims to accelerate the development of reliable smart contract decompilers for blockchain security and transparency.
Primary Area: datasets and benchmarks
Submission Number: 21287
Loading