Syntax-Aware Retrieval Augmented Code Generation

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX
Submission Type: Regular Long Paper
Submission Track: NLP Applications
Submission Track 2: Natural Language Generation
Keywords: Code Generation, Retrieval Augmented Generation, Neural-Symbolic
TL;DR: A novel retrieval augmented code generation approach.
Abstract: Neural code generation models are nowadays widely adopted to generate code from natural language descriptions automatically. Recently, pre-trained neural models equipped with token-level retrieval capabilities have exhibited great potentials in neural machine translation. However, applying them directly to code generation experience challenges: the use of the retrieval-based mechanism inevitably introduces extraneous noise to the generation process, resulting in even syntactically incorrect code. Computationally, such models necessitate frequent searches of the cached datastore, which turns out to be time-consuming. To address these issues, we propose $k$NN-TRANX, a token-level retrieval augmented code generation method. $k$NN-TRANX allows for searches in smaller datastores tailored for the code generation task. It leverages syntax constraints for the retrieval of datastores, which reduces the impact of retrieve noise. We evaluate $k$NN-TRANX on two public datasets and the experimental results confirm the effectiveness of our approach.
Submission Number: 2332
Loading