Learning Code Representations Using Multifractal-based Graph Networks

Guixiang Ma, Yao Xiao, Mihai Capota, Theodore L. Willke, Shahin Nazarian, Paul Bogdan, Nesreen K. Ahmed

2021 (modified: 26 Aug 2022)IEEE BigData 2021Readers: Everyone

Abstract: Learning representations of software codes is a critical problem for a wide range of system applications, e.g., compiler optimization, software classification, malicious software detection, and performance optimization. Recently, learning graph-based representations of software programs has been used to model the inherent structural dependencies in programming languages (e.g., C++, Python). In this paper, we propose a novel graph neural network framework that utilizes multifractal analysis for LLVM intermediate representations (IR). We then show empirically that the proposed framework is capable of capturing long-range structural dependencies that appear in software codes. We conduct experiments and comparisons on two downstream system applications: (1) predicting heterogeneous compute device mappings (graph classification), and (2) compiler reachability analysis (node classification). We observe that introducing a structural inductive bias through multifractal topological features enables GNNs to capture long-range dependencies among nodes, thus, it improves the accuracy of GNN models for applications that require learning code representations.

0 Replies