G-GQSA: Exploiting Feature-Based Vulnerabilities and Enhancing Adversarial Resilience in Android Malware Detection

Advik Raj Basani, Hemant Rathore

Published: 01 Jan 2025, Last Modified: 13 May 2025CCNC 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Android, as the world's dominant mobile operating system, provides a broad ecosystem of applications accessible through official channels like the Google Play Store and various third-party platforms. Despite rigorous security protocols, malicious actors continue to compromise user safety by embedding malware within seemingly innocuous apps. The increasing sophistication of these threats has driven the adoption of advanced machine learning (ML) and deep learning (DL) techniques for malware detection. While these techniques have shown significant promise, particularly in identifying complex and evolving malware, they remain vulnerable to adversarial attacks. In this work, we introduce a targeted evasion attack called Gradient-Guided Q-Learning with Simulated Annealing (G-GQSA), designed for grey-box scenarios, to evaluate the resilience of these detection models. G-GQSA creates adversarial examples by making minimal perturbations to the binary features of Android permissions and intents, thereby evading detection. Our experiments show that G-GQSA achieves an average fooling rate of 100% with only 3.474 perturbations for 13 permission-based models and 1.630 perturbations for 13 intent-based models. We also conduct a comprehensive feature analysis to evaluate how accurately our method identifies significant features. This analysis reveals an impressive overlap of 81.71% in critical features across all 26 classification models, demonstrating our method's effectiveness in identifying key features that impact model predictions. Finally, we implement adversarial retraining techniques to enhance the robustness of detection models, successfully reducing G-GQSA's fooling rate to 16.77% across the same 26 detection models. Our study underscores the critical need to understand the origins and interactions of adversarial samples with different malware families and emphasizes the importance of developing robust defense mechanisms before deploying ML and DL-based detection systems in real-world applications.