Deep Convolutional Malware Classifiers Can Learn from Raw Executables and Labels Only

Marek Krčál, Ondřej Švec, Martin Bálek, Otakar Jašek

Feb 12, 2018 (modified: Jun 04, 2018) ICLR 2018 Workshop Submission readers: everyone Show Bibtex
  • Abstract: We propose and evaluate a simple convolutional deep neural network architecture detecting malicious \emph{Portable Executables} (Windows executable files) by learning from their raw sequences of bytes and labels only, that is, without any domain-specific feature extraction nor preprocessing. On a dataset of 20 million \emph{unpacked} half megabyte Portable Executables, such end-to-end approach achieves performance almost on par with the traditional machine learning pipeline based on handcrafted features of Avast.
  • Keywords: deep learning, convolution, malware detection, windows executable files, portable executable, end-to-end, representation learning
  • TL;DR: We learn a deep convolutional malware classifier on 20 million of Windows EXE files represented as raw sequences of bytes and obtain results almost on par with an Avasts ML system based on human engineered features.