Research on Data Drift and Class Imbalance in Android Malware Detection

Zhen Liu, Ruoyu Wang, Bitao Peng, Changji Wang, Qingqing Gan

Published: 2023, Last Modified: 12 Aug 2025MobiQuitous (1) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In the Android ecosystem, malware detection is still a nontrivial task. Existing works have recently applied convolution neural networks (CNNs) for detecting Android malwares. However, data drift and class imbalance are still open problems in this field. The distribution of malware data may vary significantly if data are represented by unstable features, leading to data drift problems. The model may not be able to effectively detect malwares on the future data. In addition, the class imbalance may degrade a model on identifying a specific type of malwares with fewer training samples. To handle both of the two problems, this paper presents a new Android malware detection framework. Specifically, we devise a data distribution-aware feature learning framework for learning features with a stable distribution to handle data drift. We further devise a new loss function for CNN to handle the class imbalance problem. Using our loss function, this model can reinforcement learn the minority class samples and hard samples. The experimental results on the real datasets revealed that our method outperforms existing works for Android malware detection on the datasets with data drift and class imbalance problems.

External IDs:dblp:conf/mobiquitous/Liu0PWG23