Keywords: Large language models, code generation, lifelong learning, prompting, library-level understanding, version-sensitive code
TL;DR: We propose a novel dataset+benchmark called GitChameleon that contains approx 12000 samples of version sensitive code and thereby evaluate a suite of code LLMs to demonstrate their current shortcomings on generating version specific code.
Abstract: The ever-changing landscape of programming languages poses a significant challenge in the development and training of models designed for code generation. Code, being a dynamic and constantly evolving environment, necessitates a continuous process of adaptation to stay in sync with the rapidly shifting paradigms, frameworks, and methodologies within the software development domain. The inherent variability in coding styles, the emergence of new programming languages, and the continuous evolution of libraries and packages underscore the imperative for an active approach in updating code generation models. In response to this challenge, we introduce $\textcolor{violet}{\textbf{GitChameleon}}$, an innovative dataset comprising more than 12,000 version-sensitive examples in Python, designed to facilitate research into the adaptation of code generation models to the rapidly changing landscape of programming languages. Furthermore, we assess the performance of state-of-the-art code models and demonstrate their inadequacy in generating version-specific code. For example, the latest CodeLlama-70B only achieves a 46.76\% exact_string_match score when evaluated on $\textcolor{violet}{\textbf{GitChameleon}}$.
Primary Subject Area: Data collection and benchmarking techniques
Paper Type: Research paper: up to 8 pages
Participation Mode: In-person
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Submission Number: 13
Loading