Alignment-Guided Curriculum Learning for Semi-Supervised Code Translation

Anonymous

Alignment-Guided Curriculum Learning for Semi-Supervised Code Translation

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Neural code translation is the task of converting source code from one programming language to another. The scarcity of parallel code data impedes code translation models' ability to learn accurate cross-language alignment, thus restricting performance improvements. In this paper, we introduce MIRACLE, a semi-supervised approach that improves code translation through curriculum learning on code data with ascending alignment levels. It leverages static analysis and compilation to generate synthetic parallel code datasets with enhanced alignment to address the challenge of data scarcity. Extensive experiments show that MIRACLE significantly improves code translation performance on C++, Java, Python, and C, surpassing state-of-the-art models by substantial margins. Notably, it achieves up to a 43% improvement in C code translation with fewer than 150 annotated examples.

Paper Type: long

Research Area: Machine Translation

Contribution Types: Approaches to low-resource settings

Languages Studied: Python, Java, C++, C

0 Replies

Loading