CLARC: C/C++ Benchmark for Robust Code Search

ICLR 2026 Conference Submission198 Authors

01 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Code Search, Benchmark, Robustness
Abstract: Effective retrieval of code snippets from natural language queries is essential for code reuse and developer productivity. However, current benchmarks are limited: they predominantly focus on Python, lack support for industry-focused languages like C/C++, miss structured categorization, and are susceptible to models that exploit superficial lexical features instead of code semantics. To address these limitations, we introduce CLARC (C/C++ LAnguage Retrieval with Anonymized Code), a benchmark of 1,245 C/C++ query-code pairs that is fully compilable, configurable, and extensible. CLARC systematically categorizes snippets into three groups based on dependency complexity, allowing for a nuanced evaluation of retrieval performance under varying levels of code complexity. CLARC also provides configurable settings, including anonymized identifiers and low-level representations, to evaluate model robustness across different levels of code context and abstraction. Evaluation of six state-of-the-art code search methods shows significant performance drops under identifier anonymization, exposing existing models’ persistent reliance on superficial cues. Their poor performance on low-level languages such as Assembly and WebAssembly further reveals limited effectiveness beyond high-level programming languages. We also introduce an automated pipeline for scalable benchmark generation, validated through hypothesis tests, enabling the efficient creation of high-quality code search datasets that can be reused by other dataset builders. Our dataset is publicly available at https://huggingface.co/datasets/ClarcTeam/CLARC.
Primary Area: datasets and benchmarks
Submission Number: 198
Loading