MADAR: Efficient Continual Learning for Malware Analysis with Diversity-Aware Replay

Mohammad Saidur Rahman; Scott Coull; Qi Yu; Matthew Wright

MADAR: Efficient Continual Learning for Malware Analysis with Diversity-Aware Replay

Mohammad Saidur Rahman, Scott Coull, Qi Yu, Matthew Wright

25 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Malware Analysis, Windows Malware, Android Malware, Catastrophic Forgetting, Continual Learning

TL;DR: We propose MADAR, a novel continual learning framework for malware classification that achieves state-of-the-art performance across both Windows and Android malware domains.

Abstract: Millions of new pieces of malicious software (i.e., malware) are introduced each year. This poses significant challenges for antivirus vendors, who use machine learning to detect and analyze malware, and must keep up with changes in the distribution while retaining knowledge of older variants. Continual learning (CL) holds the potential to address this challenge by reducing the storage and computational costs of regularly retraining over all the collected data. Prior work, however, shows that CL techniques designed primarily for computer vision tasks fare poorly when applied to malware classification. To address these issues, we begin with an exploratory analysis of a typical malware dataset, which reveals that malware families are diverse and difficult to characterize, requiring a wide variety of samples to learn a robust representation. Based on these findings, we propose $\underline{M}$alware $\underline{A}$nalysis with $\underline{D}$iversity-$\underline{A}$ware $\underline{R}$eplay (MADAR), a CL framework that accounts for the unique properties and challenges of the malware data distribution. We extensively evaluate these techniques using both Windows and Android malware, showing that MADAR significantly outperforms prior work. This highlights the importance of understanding domain characteristics when designing CL techniques and demonstrates a path forward for the malware classification domain.

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4669

Loading