LoRACode: LoRA Adapters for Code Embeddings

Published: 06 Mar 2025, Last Modified: 19 Apr 2025DL4C @ ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 9 pages)
Keywords: embeddings, code embedding models, code, peft, lora, lora adapters, low rank adaptation, parameter efficient finetuning methods, mean reciprocal rank, code retrieval, information retrieval, code2code, text2code
TL;DR: Introducing a novel method for finetuning code embedding models on code retrieval based on text and code queries, using Low Rank Adaptation, achieving substantial improvements on Mean Reciprocal Rank only on a small subset of trainable parameters.
Abstract: Code embeddings are essential for semantic code search; however, current approaches often struggle to capture the precise syntactic and contextual nuances inherent in code. Open-source models such as CodeBERT and UniXcoder exhibit limitations in scalability and efficiency, while high-performing proprietary systems impose substantial computational costs. We introduce a parameter-efficient fine-tuning method based on Low-Rank Adaptation (LoRA) to construct task-specific adapters for code retrieval. Our approach reduces the number of trainable parameters to less than two percent of the base model, enabling rapid fine-tuning on extensive code corpora (2 million samples in 25 minutes on two H100 GPUs). Experiments demonstrate an increase of up to 9.1% in Mean Reciprocal Rank (MRR) for Code2Code search, and up to 86.69% for Text2Code search tasks across multiple programming languages. Distinction in task-wise and language-wise adaptation helps explore the sensitivity of code retrieval for syntactical and linguistic variations. To foster research in this area, we make our code and pre-trained models publicly available.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 26
Loading