Abstract: Today’s malware variants are growing at an unprecedented rate. To avoid detection by existing antivirus engines, attackers have been increasing the complexity of packers, layers of obfuscation, and encryption to obstruct the process of reverse engineering. This paper presents an automated method using static analysis for extracting opcode sequences of a length of up to 5000 and employing these sequences for classifying potential malware into eight classes, namely ransomware, trojan, backdoor, rootkit, virus, miner, benign, and other. Our empirical analysis compares four different classifiers: MLP, LSTM, GRU, and Transformer. The experimental results demonstrate that the GRU approach achieves the highest F1-score of up to 87%. In addition, we analyze dynamic API call sequences. We use a public malware dataset that comprises more than 7000 sample sequences of 342 API calls each for apps from eight different malware families. A GRU network achieves the best result for this dataset, producing an F1-score of 78%.
Loading