ComBack: A Versatile Dataset for Enhancing Compiler Backend Development Efficiency

Published: 26 Sept 2024, Last Modified: 13 Nov 2024NeurIPS 2024 Track Datasets and Benchmarks PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Compiler Backend; Code Dataset; Code Generation
TL;DR: This paper introduces ComBack, the first public dataset designed for compiler backend development, aimed at improving programmers' productivity for compiler backend development.
Abstract: Compiler backends are tasked with generating executable machine code for processors. With the proliferation of diverse processors, it is imperative for programmers to tailor specific compiler backends to accommodate each one. Meanwhile, compiler backend development is a laborious and time-consuming task, lacking effective automation methods. Although language models have demonstrated strong abilities in code related tasks, the lack of appropriate datasets for compiler backend development limits the application of language models in this field. In this paper, we introduce ComBack, the first public dataset designed for improving compiler backend development capabilities of language models. ComBack includes 178 backends for mainstream compilers and three tasks including statement-level completion, next-statement suggestion and code generation, representing common development scenarios. We conducted experiments by fine-tuning six pre-trained language models with ComBack, demonstrating its effectiveness in enhancing model accuracy across the three tasks. We further evaluated the top-performing model(CodeT5+) across the three tasks for new targets, comparing its accuracy with conventional methods (Fork-Flow), ChatGPT-3.5-Turbo, and Code-LLaMA-34B-Instruct. Remarkably, fine-tuned CodeT5+ with only 220M parameters on ComBack outperformed Fork-Flow methods significantly and surpassed ChatGPT and Code-LLaMA. This suggests potential efficiency improvements in compiler development. ComBack is avaliable at https://huggingface.co/datasets/docz-ict/ComBack.
Supplementary Material: zip
Submission Number: 258
Loading