C2Rust-Bench: A Minimized, Representative Benchmark for C-to-Rust Transpilation

C2Rust-Bench: A Minimized, Representative Benchmark for C-to-Rust Transpilation

ICLR 2026 Conference Submission22519 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Transpiler, Transpilation Evaluation, C, Rust, Memory Safety, Benchmark

TL;DR: C-to-Rust transpilation is key to addressing memory safety issues in C code, but current evaluation relies on large, unwieldy datasets. We propose C2Rust-Bench, a dataset of only 2,905 functions that offers an objectively representative benchmark.

Abstract: Despite significant effort in vulnerability detection over the last two decades, memory safety vulnerabilities continue to be a systemic problem that affects most mainstream software. Recent reports have concluded that the key to solving this issue once and for all is to migrate legacy C code to memory-safe languages. To this end, C-to-Rust "transpilation" has become a popular research topic. Recent work has proposed various approaches; however, what the community lacks is a comprehensive evaluation dataset. Currently, researchers rely on completeness through sheer sample volume, but this bloats the time required to run experiments and makes verification, which is currently done manually, laborious. In this work, we propose a method for selecting functions from a large set to construct a minimized yet representative dataset to evaluate C-to-Rust transpilation systems. We propose C2Rust-Bench, a dataset of only 2,905 functions that are nevertheless an objectively representative benchmark for C-to-Rust transpilation. This dataset was distilled from 15,503 real-world functions encompassing previous work.

Primary Area: datasets and benchmarks

Submission Number: 22519

Loading