Alignment-Guided Curriculum Learning for Semi-Supervised Code TranslationDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Neural code translation is the task of converting source code from one programming language to another. The scarcity of parallel code data impedes code translation models' ability to learn accurate cross-language alignment, thus restricting performance improvements. In this paper, we introduce MIRACLE, a semi-supervised approach that improves code translation through curriculum learning on code data with ascending alignment levels. It leverages static analysis and compilation to generate synthetic parallel code datasets with enhanced alignment to address the challenge of data scarcity. Extensive experiments show that MIRACLE significantly improves code translation performance on C++, Java, Python, and C, surpassing state-of-the-art models by substantial margins. Notably, it achieves up to a 43% improvement in C code translation with fewer than 150 annotated examples.
Paper Type: long
Research Area: Machine Translation
Contribution Types: Approaches to low-resource settings
Languages Studied: Python, Java, C++, C
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview