Keywords: Large Language Models, Code Translation, Software Engineering, C, Rust, Static Analysis
TL;DR: We present SACTOR, a static-analysis-guided, two-step LLM-based C-to-Rust translator that produces verified, idiomatic, and unsafe-free Rust code.
Abstract: Translating software written in
C to Rust has significant benefits in improving memory safety while maintaining high performance. However, manual translation is cumbersome, error-prone, and often produces unidiomatic code. Large language models (LLMs) have demonstrated promise in producing idiomatic translations, but offer no correctness guarantees as they lack the ability to capture the semantic differences between the source and target languages.
We propose SACTOR, an LLM-driven C-to-Rust translation tool that employs a two-step process: an initial "unidiomatic" translation to preserve semantics, followed by an "idiomatic" refinement to align with Rust standards. SACTOR leverages static analysis of the C source to handle pointer semantics and dependency resolution.
To validate the correctness of step-wise translation, we use end-to-end testing via the foreign function interface.
We evaluate the translation of $200$ programs from two datasets and two case studies, comparing the performance of GPT-4o, Claude 3.5 Sonnet, Gemini 2.0 Flash, Llama 3.3 70B and DeepSeek-R1 in SACTOR.
Our results demonstrate that SACTOR achieves high correctness and enhanced idiomaticity, with the best-performing model (DeepSeek-R1) reaching 93\% and 84\% correctness (on each dataset, respectively), while generating more idiomatic, Rust-compliant code, reducing Clippy lint alerts by up to 7$\times$, and producing unsafe-free translations on both datasets compared to existing methods.
Submission Number: 64
Loading