Alignment-Enhancing Parallel Code Generation for Semi-Supervised Code Translation

Ming Zhu; Ismini Lourentzou; Danfeng Yao

Alignment-Enhancing Parallel Code Generation for Semi-Supervised Code Translation

Ming Zhu, Ismini Lourentzou, Danfeng Yao

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Code Translation; Machine Translation; Code Generation; Semi-Supervised Learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: A semi-supervised code translation method that leverages parallel code generation with enhanced cross-lingual alignment.

Abstract: Code translation is the task of converting source code from one programming language to another. Sufficient parallel code data is essential for neural code translation models to learn the correct alignment across different languages. However, existing parallel code data is limited in quantity and supported languages. In this paper, we propose a semi-supervised code translation method, SPACoder, that leverages snippet training, static analysis, and compilation to generate synthetic parallel code with enhanced alignment in a scalable way, and improves code translation by curriculum learning based on the alignment level of training instances. SPACoder can be generalized to multiple languages and various models with little overhead. Extensive experiments show that SPACoder significantly improves code translation performance on C++, Java, Python, and C, outperforming state-of-the-art baselines by wide margins in execution-based evaluation (CA@1). Notably, we improve C translation by up to 43% with less than 150 annotated training instances.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6285

Loading