WaveCoder: Widespread And Versatile Enhancing Code Large Language Models By Instruction TuningDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Recent work demonstrates that, after instruction tuning, Code Large Language Models (Code LLMs) can obtain impressive capabilities to address a wide range of code-related tasks. However, current instruction tuning methods for Code LLMs mainly focus on the traditional code generation task, resulting in poor performance in complex multi-task scenarios. In this paper, we concentrate on multiple code-related tasks and present WaveCoder, a series of Code LLMs trained with Widespread And Versatile Enhanced instruction data. To enable the models to tackle complex code-related tasks,  we propose a method to stably generate diverse, high-quality instruction data from open source code dataset in multi-task scenarios and obtain CodeOcean, a dataset  comprising 19,915 instruction instances across 4 code-related tasks, which is aimed at improving the generalization ability of Code LLM. Our experiments demonstrate that WaveCoder models significantly outperform other open-source models in terms of the generalization ability across different code-related tasks. Moreover, WaveCoder-Ultra-6.7B presents the state-of-the-art generalization abilities on a wide range of code-related tasks.
Paper Type: long
Research Area: Generation
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Preprint Status: There is a non-anonymous preprint (URL specified in the next question).
A1: yes
A1 Elaboration For Yes Or No: Section limitation (the 7th section) in page 9.
A2: yes
A2 Elaboration For Yes Or No: Section 'ethics statements' ,the 8th section in page 9.
A3: yes
A3 Elaboration For Yes Or No: Section 1
B: yes
B1: yes
B1 Elaboration For Yes Or No: Section 1
B2: yes
B2 Elaboration For Yes Or No: Section 'ethics statement' (the 8th section in page 9)
B3: yes
B3 Elaboration For Yes Or No: Section 'ethics statement' (the 8th section in page 9)
B4: yes
B4 Elaboration For Yes Or No: Section 'ethics statement' (the 8th section in page 9)
B5: yes
B5 Elaboration For Yes Or No: Section 1
B6: yes
B6 Elaboration For Yes Or No: Section 3.1
C: yes
C1: yes
C1 Elaboration For Yes Or No: Section 3.1
C2: yes
C2 Elaboration For Yes Or No: Section 3.1
C3: yes
C3 Elaboration For Yes Or No: Section 3.2
C4: yes
C4 Elaboration For Yes Or No: Section 3.1, Section 3.2
D: no
D1: no
D1 Elaboration For Yes Or No: We don't use human annotators.
D2: no
D2 Elaboration For Yes Or No: We don't use human annotators.
D3: no
D3 Elaboration For Yes Or No: We don't use human annotators.
D4: no
D4 Elaboration For Yes Or No: We don't use human annotators.
D5: no
D5 Elaboration For Yes Or No: We don't use human annotators.
E: yes
E1: yes
E1 Elaboration For Yes Or No: Section 3
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview