Deep Convolutional Malware Classifiers Can Learn from Raw Executables and Labels Only

Marek Krčál; Ondřej Švec; Martin Bálek; Otakar Jašek

Deep Convolutional Malware Classifiers Can Learn from Raw Executables and Labels Only

Marek Krčál, Ondřej Švec, Martin Bálek, Otakar Jašek

12 Feb 2018 (modified: 05 May 2023)ICLR 2018 Workshop SubmissionReaders: Everyone

Abstract: We propose and evaluate a simple convolutional deep neural network architecture detecting malicious \emph{Portable Executables} (Windows executable files) by learning from their raw sequences of bytes and labels only, that is, without any domain-specific feature extraction nor preprocessing. On a dataset of 20 million \emph{unpacked} half megabyte Portable Executables, such end-to-end approach achieves performance almost on par with the traditional machine learning pipeline based on handcrafted features of Avast.

TL;DR: We learn a deep convolutional malware classifier on 20 million of Windows EXE files represented as raw sequences of bytes and obtain results almost on par with an Avasts ML system based on human engineered features.

Keywords: deep learning, convolution, malware detection, windows executable files, portable executable, end-to-end, representation learning

7 Replies

Loading