InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation

Published: 06 Mar 2025, Last Modified: 19 Apr 2025DL4C @ ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 9 pages)
Keywords: automated code translation, large language models, intermediate representation, tree of code translation
TL;DR: A novel approach to code translation using programming languages as intermediate representations via the Tree-of-Code Translation (ToCT) algorithm. Results demonstrate an 18.3% to 43.3% improvement in Computation Accuracy compared to baselines.
Abstract: Code translation, the process of converting code between programming languages (PLs), is essential for modernizing legacy systems and ensuring cross-platform compatibility. Despite recent advancements, automated code translation, including methods based on large language models (LLMs), still encounters challenges due to syntactic and semantic mismatches between PLs. In this paper, we introduce InterTrans, an LLM-based automated code translation approach that, unlike existing methods, leverages intermediate translations to bridge the syntactic and semantic gaps between source and target PLs. InterTrans uses a novel Tree of Code Translation (ToCT) algorithm to plan transitive intermediate translation sequences between a given source and target PL, then validates them in a specific order. We evaluate InterTrans with three open LLMs on three benchmarks involving six PLs. Results demonstrate an absolute improvement of 18.3% to 43.3% in Computation Accuracy (CA) for InterTrans compared to Direct Translation with 10 attempts. The best-performing variant of InterTrans (using the Magicoder LLM) achieved an average CA of 87.3%-95.4% across three benchmarks.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 39
Loading