Learning from Errors: A Data-Efficient Adaptation Method of Large Language Models for Code Generation

ACL ARR 2024 June Submission5035 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) have achieved substantial advances in code generation tasks, but they still struggle in specific code generation scenarios. These scenarios often require LLMs to be adapted to meet specific needs, but the limited training data available in practice leads to poor code generation performance. Therefore, how to effectively adapt LLMs to new scenarios with less training data is a major challenge for current code generation. In this paper, we propose a novel and effective adaptation method DEED, which stands for Data-Efficient adaptation based on Error-Driven learning for code generation. DEED leverages the errors made by LLM as learning opportunities and overcomes its own shortcomings through error revision, thereby achieving efficient learning. Specifically, DEED includes identifying the erroneous code generated by LLM, using Self-revise for code revision, optimizing the model with the revised code, and iteratively adapting the process for continuous improvement. Experimental results show that DEED achieves superior performance compared with mainstream fine-tuning and prompting methods using only a small amount of training data, with an average relative improvement of 54.7% on Pass@1 on multiple code generation datasets. We also verify the effectiveness of Self-revise, which generates revised code that optimizes the model more efficiently compared to the code samples from datasets. Moreover, DEED consistently shows strong performance across various LLMs, highlighting its generalizability.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Code Generation, Sample-Efficient Adaptation, Large Language Model
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: Programming Language
Submission Number: 5035
Loading