Code Generation from Natural Language Using Two-Way Pre-Training

Published: 01 Jan 2023, Last Modified: 14 May 2025ICACI 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Code generation aims to generate code from natural language (NL) descriptions. A known bottleneck is the lack of parallel data since manual annotation is costly. Previous researchers have compensated for this by designing special model structures to introduce prior knowledge. But this approach complicates the model, making it more difficult to use and maintain. In fact, we can improve the model performance by exploiting the available data without the complicated design of the model structure. In this paper, we adopt the idea of continual pre-training, using a two-way pre-training approach to improve the model performance. This approach does not require any task-related model modifications. On the CoNaLa dataset, our method achieves a BLEU score of 33.30, which is a 2.59 improvement compared to the baseline. Our result exceeds SOTA 0.73 for the same period of doing the experiment. Finally, we conduct case studies to demonstrate the code generation capability of our two-way pre-training method.
Loading