API2Vec: Learning Representations of API Sequences for Malware Detection

Published: 01 Jan 2023, Last Modified: 12 Sept 2024ISSTA 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Analyzing malware based on API call sequence is an effective approach as the sequence reflects the dynamic execution behavior of malware.Recent advancements in deep learning have led to the application of these techniques for mining useful information from API call sequences. However, these methods mainly operate on raw sequences and may not effectively capture important information especially for multi-process malware, mainly due to the API call interleaving problem.Motivated by that, this paper presents API2Vec, a graph based API embedding method for malware detection. First, we build a graph model to represent the raw sequence. In particular, we design the temporal process graph (TPG) to model inter-process behavior and temporal API graph (TAG) to model intra-process behavior. With such graphs, we design a heuristic random walk algorithm to generate a number of paths that can capture the fine-grained malware behavior. By pre-training the paths using the Doc2Vec model, we are able to generate the embeddings of paths and APIs, which can further be used for malware detection. The experiments on a real malware dataset demonstrate that API2Vec outperforms the state-of-the-art embedding methods and detection methods for both accuracy and robustness, especially for multi-process malware.
Loading