Keywords: LLM Transpiler, Transpilation Evaluation, C, Rust, Memory Safety, Benchmark
TL;DR: C-to-Rust transpilation is key to addressing memory safety issues in C code, but current evaluation relies on large, unwieldy datasets. We propose C2Rust-Bench, a dataset of only 2,905 functions that offers an objectively representative benchmark.
Abstract: Despite significant effort in vulnerability detection over the last two decades, memory safety vulnerabilities continue to be a systemic problem that affects most mainstream software. Recent reports have concluded that the key to solving this issue once and for all is to migrate legacy C code to memory-safe languages. To this end, C-to-Rust "transpilation" has become a popular research topic. Recent work has proposed various approaches; however, what the community lacks is a comprehensive evaluation dataset. Currently, researchers rely on completeness through sheer sample volume, but this bloats the time required to run experiments and makes verification, which is currently done manually, laborious. In this work, we propose a method for selecting functions from a large set to construct a minimized yet representative dataset to evaluate C-to-Rust transpilation systems. We propose C2Rust-Bench, a dataset of only 2,905 functions that are nevertheless an objectively representative benchmark for C-to-Rust transpilation. This dataset was distilled from 15,503 real-world functions encompassing previous work.
Primary Area: datasets and benchmarks
Submission Number: 22519
Loading