Repository-Level Prompt Generation for Large Language Models of CodeDownload PDF

01 Jun 2022 (modified: 22 Oct 2023)ICML 2022 Workshop KRLM Readers: Everyone
Keywords: codex, large langauge models for source code, code-autocompletion, information retrieval, domain-knowledge
Abstract: With the success of large language models (LLMs) of code and their use as code assistants (e.g. Codex (Chen et al., 2021) used in GitHub Copilot), development of techniques where we can have the capability to introduce domain-specific knowledge in the prompt design process becomes important. In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using a set of rules. These rules allow us to take context from the entire repository, thereby incorporating both the structure of the repository and the context from other relevant files (e.g. imports, parent class files). Our technique doesn’t require any access to the weights of the LLM, making it applicable in cases where we only have a black- box access to the LLM. We conduct experiments on the task of single line code-autocompletion using code repositories taken from Google Code archives. We demonstrate that an oracle constructed from our proposed rules gives up to 36% relative improvement over Codex, showing the quality of our proposed rules. Further, we show that when we train a model to select the best rule, we can achieve significant performance gains over Codex.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:2206.12839/code)
0 Replies

Loading