Keywords: GNN, program analysis
Abstract: Artificial intelligence are gaining more attractions for program analysis and semantic understanding. Nowadays, the prevalent program embedding techniques usually target at one single task, for example detection of binary similarity, program classification, program comment auto-complement, etc, due to the ever-growing program complexities and scale. To this end, we explore a generic program embedding approach that aim at solving multiple program analysis tasks. We design models to extract features of a program, represent the program as an embedding, and use this embedding to solve various analysis tasks. Since different tasks require not only access to the features of the source code, but also are highly relevant to its compilation process, traditional source code or AST-based embedding approaches are no longer applicable. Therefore, we propose a new program embedding approach that constructs a program representation based on the assembly code and simultaneously exploits the rich graph structure information present in the program. We tested our model on two tasks, program classification and binary similarity detection, and obtained accuracy of
80.35% and 45.16%, respectively.
5 Replies
Loading