Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation
Abstract: Code translation benchmarks are crucial for evaluating the accuracy and efficiency of LLM-based translation systems. However, existing ones focus on individual functions, neglecting repository-level challenges like inter-module coherence and dependency management. While some recent repository-level benchmarks attempt to address these issues, they suffer from poor maintainability and coarse evaluation granularity, limiting their usefulness to developers. We introduce Skeleton-Guided-Translation, a framework for repository-level Java-to-C\# translation with fine-grained quality evaluation. It follows a two-step process: first translating repository "skeletons," then refining the full repository guided by these skeletons. Building on this, we present TransRepo-bench, a benchmark of high-quality Java repositories with corresponding C\# skeletons, including matching unit tests and build configurations. Our adaptive unit tests, supporting multiple or incremental translations without manual adjustments, enhancing automation and scalability. Additionally, we introduce fine-grained metrics that assess translation quality at the test-case level, addressing traditional binary metrics' limitations in distinguishing build failures. Evaluations using TransRepo-bench reveal issues like broken cross-file references, showing that our structured approach reduces dependency errors and preserves interface consistency.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking, evaluation methodologies, NLP datasets
Contribution Types: Data resources
Languages Studied: Java, C#
Submission Number: 3601
Loading