Rethinking the Role of Structural Information: How It Enhances Code Representation Learning?

Qiushi Sun, Nuo Chen, Jianing Wang, Xiaoli Li

Published: 01 Jan 2024, Last Modified: 20 Dec 2024IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Code pre-trained models (CodePTMs) have recently exhibited remarkable accomplishments in the realm of software engineering. However, there are still limited advancements in understanding the inner mechanism of these models, as well as their sensitivity to samples of varying quality. Codes have a more rigid and structured syntax compared to natural languages; hence, leveraging and understanding structural information becomes essential for analyzing, interpreting, and utilizing CodePTMs. While previous studies have verified models’ ability to acquire knowledge from code structure through techniques such as attention analysis and probing tasks, the specific roles it plays in downstream tasks have yet to be explored. In this work, we propose a set of novel and practical methods for probing and exploiting the structural information within the code. In particular, dataflow perturbation experiments are first employed to explore the sensitivity of models with varying levels of structural information when confronted with input changes. Based on our findings, structure-aware exemplars selection strategies are proposed for both code generation and understanding, aiming to recover the model performance at minimal cost under perturbed conditions. Moreover, efficient fine-tuning can be achieved by utilizing exemplars instead of full fine-tuning.