SaraCoder: Orchestrating Semantic and Structural Cues for Resource-Optimized Repository-Level Code Completion

Xiaohan Chen; Zhongying Pan; Quan Feng; Yu Tian; Shuqun yang; Mengru Wang; Lina Gong; Yuxia Geng; Piji Li; Xiang Chen

SaraCoder: Orchestrating Semantic and Structural Cues for Resource-Optimized Repository-Level Code Completion

Xiaohan Chen, Zhongying Pan, Quan Feng, Yu Tian, Shuqun yang, Mengru Wang, Lina Gong, Yuxia Geng, Piji Li, Xiang Chen

18 Sept 2025 (modified: 05 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Repository-Level Code Completion, Large Language Model

TL;DR: SaraCoder boosts code completion by refining retrieved snippets beyond superficial similarity to capture deep semantic and structural relevance.

Abstract: Despite Retrieval-Augmented Generation improving code completion, traditional retrieval methods struggle with information redundancy and a lack of diversity within limited context windows. To solve this, we propose a resource-optimized retrieval augmentation method, SaraCoder. It maximizes information diversity and representativeness in a limited context window, significantly boosting the accuracy and reliability of repository-level code completion. Its core Hierarchical Feature Optimization module systematically refines candidates by distilling deep semantic relationships, pruning exact duplicates, assessing structural similarity with a novel graph-based metric that weighs edits by their topological importance, and reranking results to maximize both relevance and diversity. Furthermore, an External-Aware Identifier Disambiguator module accurately resolves cross-file symbol ambiguity via dependency analysis. Extensive experiments on the challenging CrossCodeEval and RepoEval-Updated benchmarks demonstrate that SaraCoder outperforms existing baselines across multiple programming languages and models. Our work proves that systematically refining retrieval results across multiple dimensions provides a new paradigm for building more accurate and resource-optimized repository-level code completion systems.

Supplementary Material: zip

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 10636

Loading