CodeBC: A More Secure Large Language Model for Smart Contract Code Generation in Blockchain

Anonymous

CodeBC: A More Secure Large Language Model for Smart Contract Code Generation in Blockchain

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Blockchain, a decentralized distributed ledger database, records transactions across multiple computers in a secure, transparent and tamper-resistant manner. To ensure this, smart contract code is introduced to predefine transaction rules, and stipulate that the code should automatically execute without intermediaries when someone calls it. That is, if malicious actors call the code with vulnerabilities, these automatic execution codes may cause significant economic losses to users. Therefore, the security of smart contract code is crucial in Blockchain domain. Currently, smart contracts are primarily manually written by developers, facing challenges such as experienced developer shortage, low development efficiency, and substantial security risks. There is an urgent need for code generation technology to assist both developers and non-professional programmers in creating secure and efficient smart contract codes.In this paper, we propose CodeBC, a more secure smart contract Code generation model for Blockchain, which employs a two-stage fine-tuning approach based on CodeLlama: the first stage uses a multi-task learning framework for code infilling and vulnerability detection, enhancing the model's understanding of smart contract code and its ability to identify security vulnerabilities; in the second stage, tags-guided instruction fine-tuning is employed to improve the model's comprehension of human instructions, thereby generating higher-security code.We construct an Blockchain-HumanEval dataset to assess whether the generated code meets human requirements. Experimental results demonstrate that CodeBC achieves higher BLEU, CodeBLEU, compilation pass rates and lower vulnerability rates compared to baselines, validating the effectiveness of our two-stage fine-tuning strategy.

Paper Type: long

Research Area: Generation

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models

Languages Studied: English

0 Replies

Loading