Abstract: Android malware is the major cyber threat to the popular Android platform which may influence millions of end users. To battle against the Android malware, a large number of machine learning methods either based on 1) traditional feature extraction using static and dynamic analysis, or 2) recently proposed image representations, have been developed, and have achieved promising results. However, the vast majority of the existing work rely on a large number of labeled samples which are unfortunately not available for the newly reported Android malware families. This poses a critical challenge to detect such few-shot Android malware families. In this paper, we propose a novel few-shot learning approach based on the image representation of an Android application to solve the problem. With an application file converted into an image representation, we preserve all the source code information. We then utilize self-supervised learning to obtain the pre-trained backbone from the unlabeled auxiliary data and employ a metric-based few-shot learning method for Android malware classification. Considering the impact of irrelevant information across samples on the family classification, we employ a multi-cropping strategy to capture family label-related information in the images. Extensive experimental results on the popular CICInvesAndMal2019 dataset confirm the effectiveness of our approach in detecting few-shot Android malware families. We achieve at least 3.16% and 3.7% improvement on 5-way 1-shot and 5-way 5-shot scenarios respectively comparing to state-of-the-art baselines.
External IDs:dblp:journals/cybersec/ZhouWXSW25
Loading