Enhancing Code Generation for Dataflow Programming: Fine-Tuning Large Language Models with the DFCPP Dataset
Abstract: In recent years, large language models (LLMs) based on the Transformer architecture have demonstrated excellent performance in code generation, but there have been fewer studies on data flow languages. This study proposes a scheme for fine-tuning large language models based on the DFCPP dataset. We demonstrate the model's ability to generate dataflow graph (DAG) topologies and achieve significant performance improvements. Experimental results show that the BLEU score of the fine-tuned model in the DFCPP code generation task reaches 0.193, which is an increase of 112.1% compared to the non-fine-tuned model (0.091). This demonstrates the effectiveness of fine-tuning techniques in domain-specific code generation.
Loading