Abstract: Considering the fatality of malware attacks, the data-driven approach using massive malware observations has been verified. Deep learning-based approaches to learn the unified features by exploiting the local and sequential nature of control flow graph achieved the best performance. However, only considering local and sequential information from graph-based malware representation is not enough to model the semantics, such as structural and functional nature of malware. In this paper, functional nature are combined to the control flow graph by adding opcodes, and structural nature is embedded through DeepWalk algorithm. Subsequently, we propose the transformer-based malware control flow embedding to overcome the difficulty in modeling the long-term control flow and to selectively learn the code embeddings. Extensive experiments achieved performance improvement compared to the latest deep learning-based graph embedding methods, and in a 37.50% improvement in recall was confirmed for the Simda botnet attack.
0 Replies
Loading