Abstract: Machine learning models are widely used for malware detection based on static features. However, many studies in this area show inconsistencies in their experimental settings, often failing to adequately consider the nature of the datasets, the underlying tasks, and the models being evaluated. This lack of standardization complicates the reproducibility of results on public datasets. In this paper, we address these challenges by proposing a more rigorous experimental and model selection methodology for malware detection. Specifically, we focus on Android malware detection using two public datasets evaluated under offline and continuous active learning settings. We implement six machine learning models of varying complexity across diverse experimental configurations. Our results show that tree-based methods, such as XGBoost, frequently outperform advanced neural networks in various scenarios. To promote reproducibility, we open-source our code, ensuring it is extensible for incorporating new models and datasets.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Included a PDF with results on threshold tuning using the validation set.
Assigned Action Editor: ~Fernando_Perez-Cruz1
Submission Number: 4149
Loading