Can Long-Context Language Models Solve Repository-Level Code Generation?

YIBO PENG; Zora Zhiruo Wang; Daniel Fried

Can Long-Context Language Models Solve Repository-Level Code Generation?

YIBO PENG, Zora Zhiruo Wang, Daniel Fried

Published: 06 Apr 2025, Last Modified: 18 Apr 2025LTI-SRS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Main Track

Keywords: Long-Context Language Models, RAG, Code Generation

Abstract: With the advance of real-world tasks that necessitate increasingly long contexts, recent language models (LMs) have begun to support longer context windows. One particularly complex task is repository-level code generation, where retrieval-augmented generation (RAG) has been used as the de facto approach. Nonetheless, RAG may not be optimal in processing entire codebases with cross-file dependencies. Therefore, we ask: can we instead leverage long-context LMs to solve repository-level code generation problems? To answer this question, we conduct a comparative study of LC and RAG methods using top-performing open-source CODELLAMA 7B and closed CLAUDE-3.5-sonnet models. We evaluate on the repository-level code completion benchmark — RepoEval (Zhang et al., 2023), and find that LC can match or surpass RAG performance when the repository is sufficiently small and well-structured, yet RAG still outperforms LC when the repository grows larger or involves complex structures, dependencies, or domain-specific implementations. We further ablate on context ordering and code snippet chunking, and find that better ordering of input code snippets can boost both LC results, while design choices for code snippet chunking such as size and overlaps do not produce prominent effects. Overall, our work reveals the scenarios where current LC methods are shown effective and fall short in repository-level code generation, potentially offering insights for future method developments.

Submission Number: 11

Loading