Abstract: Current approaches commonly integrate repository-level code completion with retrieval-augmented generation. Specifically, private code repositories are utilized as retrieval databases, which aim to supply relevant code chunks to a large language model (LLM). However, incorporating multiple retrieved code chunks into an LLM will increase the cost of inference. This not only decreases the efficiency of the LLM but also impairs the user experience. To address this, we introduce $\textbf{RepoLC}$, which uses a $\textbf{L}$ight module to $\textbf{C}$ompress the retrieved code, thereby reducing the inference cost of LLMs. We insert a Semantic Compressor Encoder (SCE) between the retriever and the generator. Specifically, SCE compresses the retrieved code chunks into fewer high-level tokens and then projects them to the semantic space of the LLM. We propose a two-stage training scheme to train the overall pipeline through semantic alignment and task alignment. Experimental results demonstrate that our approach achieves significant improvements on multiple datasets. Compared to other methods, our approach incurs in minimal loss and achieves an inference time that is almost as efficient as that of in-file processing.
Paper Type: Long
Research Area: Generation
Research Area Keywords: Generation
Languages Studied: English
Submission Number: 1967
Loading