Source Code Data Augmentation for Deep Learning: A Survey

Abstract: The increasingly popular adoption of deep learning models in many critical source code tasks motivates the development of data augmentation (DA) techniques to enhance training data and improve various capabilities (e.g., robustness and generalizability) of these models. Although a series of DA methods have been proposed and tailored for source code models, there is a lack of comprehensive surveys and examinations to understand their effectiveness and implications. This paper fills this gap by conducting a comprehensive and integrative survey of data augmentation for source code, wherein we systematically compile and encapsulate existing literature to provide a comprehensive overview of the field. Complementing this, we present a continually updated GitHub repository that hosts a list of update-to-date papers on DA for source code modeling\footnote{\url{}}.
