GitChameleon: Breaking the version barrier for code generation models

GitChameleon: Breaking the version barrier for code generation models

ICLR 2024 Workshop DMLR Submission13 Authors

Published: 04 Mar 2024, Last Modified: 02 May 2024DMLR @ ICLR 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large language models, code generation, lifelong learning, prompting, library-level understanding, version-sensitive code

TL;DR: We propose a novel dataset+benchmark called GitChameleon that contains approx 12000 samples of version sensitive code and thereby evaluate a suite of code LLMs to demonstrate their current shortcomings on generating version specific code.

Abstract: The ever-changing landscape of programming languages poses a significant challenge in the development and training of models designed for code generation. Code, being a dynamic and constantly evolving environment, necessitates a continuous process of adaptation to stay in sync with the rapidly shifting paradigms, frameworks, and methodologies within the software development domain. The inherent variability in coding styles, the emergence of new programming languages, and the continuous evolution of libraries and packages underscore the imperative for an active approach in updating code generation models. In response to this challenge, we introduce $\textcolor{violet}{\textbf{GitChameleon}}$, an innovative dataset comprising more than 12,000 version-sensitive examples in Python, designed to facilitate research into the adaptation of code generation models to the rapidly changing landscape of programming languages. Furthermore, we assess the performance of state-of-the-art code models and demonstrate their inadequacy in generating version-specific code. For example, the latest CodeLlama-70B only achieves a 46.76\% exact_string_match score when evaluated on $\textcolor{violet}{\textbf{GitChameleon}}$.

Primary Subject Area: Data collection and benchmarking techniques

Paper Type: Research paper: up to 8 pages

Participation Mode: In-person

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

Submission Number: 13

Loading