Keywords: Code Translation, Large Language Models (LLMs), High-Performance Computing (HPC), Code Modernization, Code Portability
TL;DR: We create an easily extensible tool that leverages three independent LLM and validator feedback loops to outperform the SOTA in source-to-source code translation.
Abstract: We present an LLM-based code translation and repair framework called TRI-anslate, which translates existing code written in an arbitrary source language to an arbitrary target language and validates that the output code adheres to desired properties via testing.
Existing work has shown that LLMs are remarkable at code translation and repair tasks. Furthermore, specialized fine-tuned or distilled LLMs can extend these capabilities to handle niche languages, perform syntax repair with relatively small cost, or perform semantic repair taking into account common errors.
However, the most robust currently available tools that leverage these LLMs assign all these distinct subtasks to a single LLM with a feedback loop from a validation tool. Further, they rely on a rigid set of possible errors as part of the corrective feedback from the validator or verifier. By contrast, TRI-anslate allows for a user-specified error set and leverages 3 separate LLM feedback loops to fully utilize the capability of LLMs specialized for generation, syntactic repair, and semantic repair. This also avoids wasting context of later LLMs on the correction conversation of previous LLMs.
We conduct an extensive evaluation, showcasing the advantage of TRI-anslate over the existing work using the same setup ($\approx8$% increase comparing the base model, $\approx 45$% for the fine-tuned model in CUDA to OpenMP Target Offloading Translation). We also demonstrate how being able to choose different models per subtask allows TRI-anslate to outperform LASSI using any of the individual models, and highlight the extensibility of TRI-anslate by documenting the effort required to add a new translation task (CUDA to SYCL).
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 21582
Loading