AST2Vec: A Robust Neural Code Representation for Malicious PowerShell Detection

Han Miao; Huaifeng Bao; Zixian Tang; Wenhao Li; Wen Wang; Huashan Chen; Feng Liu; Yanhui Sun

AST2Vec: A Robust Neural Code Representation for Malicious PowerShell Detection

Han Miao, Huaifeng Bao, Zixian Tang, Wenhao Li, Wen Wang, Huashan Chen, Feng Liu, Yanhui Sun

Published: 01 Jan 2023, Last Modified: 11 Apr 2025SciSec 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In recent years, PowerShell has become a commonly used carrier to wage cyber attacks. As a script, PowerShell is easy to obfuscate to evade detection. Thus, they are difficult to detect directly using traditional anti-virus software. Existing advanced detection methods generally recover obfuscated scripts before detection. However, most deobfuscation tools can not achieve precise recovery on obfuscated scripts due to emerging obfuscation techniques. To solve the problem, we propose a robust neural code representation method, namely AST2Vec, to detect malicious PowerShell without de-obfuscating scripts. 6 Abstract Syntax Tree (AST) recovery-related statement nodes are defined to identify obfuscated subtrees. Then AST2Vec splits the large AST of entire PowerShell scripts into a set of small subtrees rooted by these 6 types of nodes and performs tree-based neural embeddings on all extracted subtrees by capturing lexical and syntactical knowledge of statement nodes. Based on the sequence of statement vectors, a bidirectional recursive neural network (Bi-RNN) is modeled to leverage the context of statements and finally produce vector representation of scripts. We evaluate the proposed method for malicious PowerShell detection through extensive experiments. Experimental results indicate that our model outperforms the state-of-the-art approaches.

Loading