Abstract: As malware threats continue to increase in both complexity and sophistication, the adoption of advanced detection methods, such as deep neural networks (DNNs) for malware classification, has become increasingly vital to safeguard digital infrastructure and protect sensitive data. In order to measure progress in this safety-critical landscape, we propose two malware classification benchmarks: a feature-based benchmark and an image-based benchmark. Feature-based datasets provide a detailed understanding of malware characteristics, and image-based datasets transform raw malware binary data into grayscale images for swift processing. These datasets can be used for both binary classification (benign vs. malicious) as well as classifying known malware into a particular family. This paper, therefore, introduces two benchmark datasets for binary and family classification with varying difficulty levels to quantify improvements in malware classification strategies. Key contributions include the creation of feature and image dataset benchmarks, and the validation of a trained binary classification network using the feature dataset benchmark. Benchmarks as well as example training code are available.
0 Replies
Loading