RAR: Retrieval Augmented Retrieval for Code Generation in Low Resource Languages

RAR: Retrieval Augmented Retrieval for Code Generation in Low Resource Languages

ACL ARR 2024 June Submission3554 Authors

16 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract:

Language models struggle in generating correct code for low resource programming languages, since these are underrepresented in training data. Popular approaches use either examples or documentation to improve the performance of these models. Instead of considering the independent retrieval of this information, we introduce retrieval augmented retrieval (RAR) as a two-step retrieval method for selecting relevant examples and documentation. Extensive experiments on two low resource languages (Power Query M and OfficeScript) show that RAR outperforms example or grammar retrieval techniques (2.81--26.14%).

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: code generation, retrieval augmented generation, context retrieval, grammar prompting

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 3554

Loading