Rust-doctor: Enhanced Feature for Rust Ownership and Lifetime Repair with Balanced Training Data Generation

Rust-doctor: Enhanced Feature for Rust Ownership and Lifetime Repair with Balanced Training Data Generation

ACL ARR 2025 May Submission5333 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As a relatively new programming language, Rust has gained significant popularity in recent years due to its safety features during compilation. However, Rust developers often face challenges stemming from its strict compilation checks due to the steep learning curve of safety rules. To make matters worse, the lack of training data and the unique semantics of Rust lead to poor performance in learning-based automated program repair techniques. To address these challenges, we propose a novel error injection approach to generate a balanced training dataset and leverage the Mid-level Intermediate Representation (MIR) as enhanced features for Rust’s unique compilation error repair. Using these innovations, we fine-tuned a new code model, LLaRRA: \textbf{L}arge \textbf{L}anguage \textbf{a}nd \textbf{R}ust \textbf{R}epair \textbf{A}ssistant. Experimental results demonstrate that LLaRRA significantly outperforms state-of-the-art models in terms of Pass@K and Acc@K.

Paper Type: Long

Research Area: Multilingualism and Cross-Lingual NLP

Research Area Keywords: less-resourced languages,software and tools

Contribution Types: Publicly available software and/or pre-trained models, Data resources

Languages Studied: Programming language, Rust

Submission Number: 5333

Loading