EffiCoder: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning

Dong HUANG; Guangtao Zeng; Jianbo Dai; Meng Luo; Han Weng; Yuhao QING; Heming Cui; Zhijiang Guo; Jie Zhang

EffiCoder: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning

Dong HUANG, Guangtao Zeng, Jianbo Dai, Meng Luo, Han Weng, Yuhao QING, Heming Cui, Zhijiang Guo, Jie Zhang

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As large language models (LLMs) play an increasingly important role in code generation, enhancing both correctness and efficiency has become crucial. Current methods primarily focus on correctness, often overlooking efficiency. To address this gap, we introduce SWIFTCODE to improve both aspects by fine-tuning LLMs on a high-quality dataset comprising correct and efficient code samples. Our methodology involves leveraging multiple LLMs to generate diverse candidate code solutions for various tasks across different programming languages. We then evaluate these solutions by directly measuring their execution time and memory usage through local execution. The code solution with the lowest execution time and memory consumption is selected as the final output for each task. Experimental results demonstrate significant improvements when fine-tuning with SWIFTCODE. For instance, Qwen2.5-Coder-7B-Instruct's pass@1 score increases from 44.8\% to 57.7\%, while the average execution time for correct tasks decreases by 48.4\%. SWIFTCODE offers a scalable and effective solution for advancing AI-driven code generation, benefiting both software development and computational problem-solving.

Lay Summary: Why did LLM generate inefficient code than the human expert-written solution? Our empirical study reveals that the efficiency of LLM-generated code is strongly correlated with the efficiency of the training dataset. Based on our observation, we construct an efficient code instruction tuning dataset, EffiInstruct, to fine-tune LLMs and then improve the efficiency of LLM-generated code.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/EffiBench/EffiCoder

Primary Area: Deep Learning->Large Language Models

Keywords: LLM-generated code, efficiency

Submission Number: 4607

Loading