LlavaCode: Compressed Code Representations for Retrieval-Augmented Code Generation

Daria Cherniuk; Nikita Sukhorukov; Nikita Sushko; Danil Gusak; Danil Sivtsov; Elena Tutubalina; Evgeny Frolov

LlavaCode: Compressed Code Representations for Retrieval-Augmented Code Generation

Daria Cherniuk, Nikita Sukhorukov, Nikita Sushko, Danil Gusak, Danil Sivtsov, Elena Tutubalina, Evgeny Frolov

16 Sept 2025 (modified: 15 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: rag, ast, dct, llm, code

TL;DR: We propose LlavaCode, a method to compress repository context for retrieval-augmented code generation. By using compact representations, we cut sequence length and speed up line completion by up to 19% without hurting prediction quality.

Abstract: Retrieval-augmented generation has emerged as one of the most effective approaches for code completion, especially when context from the surrounding repository is important. However, adding this context substantially increases sequence length, which slows inference—an important limitation for interactive settings such as IDEs. In this work, we introduce LlavaCode, a framework that compresses context into compact, semantically rich representations that remain interpretable to code LLMs. This improves generation quality while reducing prompt augmentation to only a few compressed single-token vectors. Our approach requires training only a small projector module and introduces negligible additional latency, yet it significantly improves the prediction quality of code LLMs. Our experiments show that LlavaCode enables a 20–38\% reduction in Time-to-First-Token (TTFT) on line-completion tasks compared with uncompressed RAG.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 7847

Loading