From Cryptic to Clear-Training on LLM Explanations to Detect Smart Contract Vulnerabilities

Yizhou Chen, Zeyu Sun, Guoqing Wang, Qingyuan Liang, Xiao Yu, Dan Hao

Published: 19 Jul 2025, Last Modified: 22 Jan 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Smart contracts have revolutionized the way transactions are executed, offering decentralized and immutable frameworks. The immutability of smart contracts poses significant risks when vulnerabilities exist in their code, leading to financial losses. Despite advancements in using deep learning for smart contract vulnerability detection (SCVD), existing methods struggle with the complex logic and intricate semantics embedded within smart contract code. Large Language Models (LLMs) have shown promise in providing deeper insights into smart contract logic. However, LLMs, such as GPT, follow a decoder-only architecture and are trained in an unsupervised manner rather than learning specific labels. In the SCVD task, these LLMs have difficulty in capturing information related to vulnerabilities, leading to very low accuracy. Therefore, we propose CodeXplain, a novel SCVD approach that leverages the deep insights into code from LLM and the supervised learning capabilities of deep learning models to set the latest advance and performance. In particular, we deeply analyze 14 types of dangerous and common smart contract vulnerabilities. Based on the rationale of these vulnerabilities, nine perspective prompts are introduced to guide LLMs in generating code explanations that contribute to SCVD. Then, we propose a CodeT5-based semantic fusion module integrating smart contract code and code explanations. Finally, the performance of SCVD is improved by performing supervised learning on trusted labels. Experimental results on 3,544 real-world smart contracts demonstrate that CodeXplain outperforms 16 state-of-the-art SCVD methods, achieving an F1-score of 94.12\% and an accuracy of 93.88\%, surpassing all baselines.