Abstract: Program translation aims to translate source code from one programming language (PL) to another. Current research on code translation predominantly focuses on high-resource PLs like Python and Java, leaving low-resource languages insufficiently explored. Fortunately, the rapid advancement of Large Language Models (LLMs) has created new opportunities for research on low-resource PLs.To mitigate this gap in the era of foundation models, we introduce OptCodeTrans, a two-phase post-training approach involving continued pre-training and instruction fine-tuning. We provide a high-quality dataset of three low-resource languages representing different programming paradigms, including Cangjie, Julia, and OCaml.Our work provides valuable insights into effective post-training strategies for adapting LLMs to low-resource code translation tasks. Extensive experiments demonstrate the effectiveness of OptCodeTrans, achieving an average improvement of 10.28 in BLEU and 5.15 in functional equivalence across all translation tasks and backbone models.
External IDs:dblp:conf/forge/LinSLNL25
Loading