Source Code Data Augmentation for Deep Learning: A Survey

Source Code Data Augmentation for Deep Learning: A Survey

ACL ARR 2024 June Submission2224 Authors

15 Jun 2024 (modified: 07 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The increasingly popular adoption of deep learning models in many critical source code tasks motivates the development of data augmentation (DA) techniques to enhance training data and improve various capabilities (e.g., robustness and generalizability) of these models. Although a series of DA methods have been proposed and tailored for source code models, there is a lack of comprehensive surveys and examinations to understand their effectiveness and implications. This paper fills this gap by conducting a comprehensive and integrative survey of data augmentation for source code, wherein we systematically compile and encapsulate existing literature to provide a comprehensive overview of the field. Complementing this, we present a continually updated GitHub repository that hosts a list of update-to-date papers on DA for source code modeling\footnote{\url{https://anonymous.4open.science/r/da4code}}.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: code generation and understanding, data augmentation

Contribution Types: Position papers, Surveys

Languages Studied: English, Programming Languages

Submission Number: 2224

Loading