Abstract: Code representation aims to transform source code into mathematical or vector forms to support models in processing and analyzing code. The data flow in code represents the information about how variables are passed and manipulated within the code. Existing research on data flow-oriented code representation primarily utilizes the semantic information of variables in the data flow to guide the model in representing code. However, the semantic information of variables is insufficient to reveal the structural features of the code, making it difficult for the model to achieve a comprehensive representation of the code. In this research, we propose a new code representation model, SDCR. Specifically, SDCR employs a specialized algorithm to identify the structural information of variables in the code and combines it with semantic information to enhance the understanding of codes. Meanwhile, SDCR introduces an attention mechanism to highlight the structural connections between variables. Additionally, SDCR employs a special mask matrix to enhance its ability to perceive code structure. The experimental results and analysis indicate that the proposed model has improved the relevant task metric scores by 2.6 points compared to the baseline model.
Loading