Keywords: Malware Detection*Graph Attention Networks*Dynamic Gating Mechanisms*Multi-Modal Fusion*Transfer Learning
TL;DR: This paper proposes DGSM-SCAM-GAT and MMT-ViT, two complementary models for robust malware detection via graph-based dynamic API analysis and multi-modal static feature fusion, achieving superior accuracy and F1-scores on benchmark datasets.
Abstract: Malware detection encounters substantial challenges in real-time and multi-class tasks, as single-modality methods struggle to capture intricate behavioral patterns. To mitigate these limitations, we introduce two complementary models: DGSM-SCAM-GAT and MMT-ViT. The DGSM-SCAM-GAT model integrates dynamic gating, contextual aggregation mechanisms, and graph attention networks (GAT) to enhance temporal and structural modeling of API call sequences. Trained on a dynamic API call sequence dataset, it attains an accuracy of 99.31\% and an F1-score of 99.64\%, surpassing CNN-LSTM (accuracy: 98.92\%). The MMT-ViT model employs multi-modal attention mechanisms and the pre-trained ViT architecture to effectively fuse features from assembly instruction sequences, binary grayscale images, and binary wavelet sequence features. Evaluated on a public dataset, it achieves 99.54\% accuracy and 99.55\% F1-score, outperforming Malcse (accuracy: 98.94\%). Furthermore, ablation studies validate the critical contributions of individual modules, while comparative experiments underscore the superiority of our proposed models over state-of-the-art baselines. The detection frameworks developed in this study facilitate robust dynamic and static malware identification, with code available in the supplementary materials.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 5118
Loading