Keywords: Code completion, RAG, LLMs
Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising approach for repository-level code completion by integrating cross-file knowledge with in-file preceding code to provide comprehensive contexts for generation. To better understand the contribution of the retrieved cross-file contexts, we introduce a likelihood-based metric to evaluate the impact of each retrieved code chunk on the completion. Our analysis reveals that, despite retrieving numerous chunks, only a small subset positively contributes to the target completion, while some chunks even degrade performance. To address this issue, we leverage this metric to construct a repository-level dataset where each retrieved chunk is labeled as positive, neutral, or negative based on its relevance to the target completion. We then propose an adaptive retrieval context trimming framework, REPOFILTER, trained on this dataset to mitigate the harmful effects of negative retrieved contexts in RAG-based code completion. Extensive evaluation on the RepoEval and CrossCodeLongEval benchmarks demonstrates that REPOFILTER consistently improves completion accuracy compared to approaches without filtering operations across various tasks. Additionally, REPOFILTER significantly reduces the length of the input prompt, enhancing computational efficiency while exhibiting strong generalizability across different models. These results underscore the potential of REPOFILTER to enhance the accuracy, efficiency, and attributability of RAG-based repository-level code completion.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6125
Loading