Track: long paper (up to 9 pages)
Keywords: Self-Distillation, Multi-Exit Transformer, Early Layer Representation, Large Language Model, Code Retrieval, Code Search, Code Embedder
TL;DR: MODULARSTARENCODER is a modular multi-exit encoder trained with a novel self-distillation mechanism to improve early-layer embeddings, enhancing code retrieval efficiency while allowing flexible model scaling for different computational constraints.
Abstract: Deploying language models often requires handling model size vs. performance trade-offs to satisfy downstream latency constraints while preserving the model’s usefulness. Model distillation is commonly employed to reduce model size under acceptable performance. However, distillation can be inefficient since it involves multiple training steps. In this work, we introduce MODULARSTARENCODER, a modular multi-exit encoder with 1B parameters, useful for multiple tasks within the scope of code retrieval. MODULARSTARENCODER is trained with a novel self-distillation mechanism that significantly improves lower-layer representations—allowing different portions of the model to be used while still maintaining a good trade-off in terms of performance. Our architecture focuses on enhancing text-to-code and code-to-code search by systematically capturing syntactic and semantic structures across multiple levels of representation. Specific encoder layers are targeted as exit heads, allowing higher layers to guide earlier layers during training. This self-distillation effect improves intermediate representations, increasing retrieval recall at no extra training cost. In addition to the multi-exit scheme, our approach integrates a repository-level contextual loss that maximally utilizes the training context window, further enhancing the learned representations. We also release a new dataset constructed via code translation, seamlessly expanding traditional text-to-code benchmarks with code-to-code pairs across diverse programming languages. Experimental results highlight the benefits of self-distillation through multi-exit supervision.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Andrea_Gurioli1
Format: Maybe: the presenting author will attend in person, contingent on other factors that still need to be determined (e.g., visa, funding).
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 60
Loading