Abstract: Translating programs between various parallel programming languages is an important problem in the high-performance computing (HPC) community, with implications for industry and academia. Existing tools for this problem are either too narrow in scope (translate between specific languages) and/or outdated (requiring maintenance). Recent explosive growth in the popularity of large language models (LLMs) and their ability to generate and translate code offers a potential alternative approach. Toward that end, we first need to systematically evaluate the ability of LLMs to translate between parallel languages.In this work, we introduce UniPar, a systematic evaluation framework for LLM-based parallel code translation. Specifically, in this work, we target translations between serial code, CUDA, and OpenMP. Our goal is to assess how well current instruction-tuned LLMs – specifically GPT-4o-mini and LLaMA-3.3-70B-Instruct – can be used out of the box or enhanced through known strategies. We evaluated four major usage modes: hyperparameter optimization for decoding, zero- and few-shot prompting, supervised fine-tuning, and iterative feedback through compiler-based repair. As a part of the evaluation, we construct a new dataset called ParaTrans, covering both serial-to-parallel translation and cross-paradigm transformations.Our findings reveal that while off-the-shelf models struggle under the default settings (e.g., GPT-4o-mini achieves only 46% compilation and 15% functional correctness), our UniPar methodology – combining fine-tuning, hyperparameter tuning, and compiler-guided repair – improves performance by up to 2X (69% compilation and 33% correctness). We believe that our findings will provide useful insights for researchers to further improve LLMs for the parallel language translation problem.UniPar source code and ParaTrans dataset are available at our GitHub ${\color{Magenta}{\text{repository}}}$.
External IDs:dblp:conf/hpec/BitanKKMCMHO25
Loading