Abstract: Automated code generation is a pivotal capability of large language models (LLMs). However, assessing this capability in real-world scenarios remains challenging. Previous methods focus more on low-level code generation, such as model loading, instead of generating high-level codes catering for real-world tasks, such as image-to-text, text classification, in various domains.Therefore, we construct AICoderEval, a dataset focused on real-world tasks in various domains based on HuggingFace, PyTorch, and TensorFlow, along with comprehensive metrics for evaluation and enhancing LLMs' task-specific code generation capability.After that, we propose CoderGen, an agent-based framework, to help LLMs generate codes related to real-world tasks on the constructed AICoderEval. Moreover, we train a more powerful task-specific code generation model, named AICoder, which is refined on codellama based on AICoderEval.Our experiments demonstrate the effectiveness of CoderGen in improving LLMs' task-specific code generation capability (by 30.26\% on SR@All and 19.88\% on SR@Any).And the proposed AICoder also outperform the current code generation LLMs, indicating the great quality of the AICoderEval benchmark for evaluation and enhancing LLMs' task-specific code generation capability.
Paper Type: long
Research Area: Generation
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
0 Replies
Loading