Deep Convolutional Malware Classifiers Can Learn from Raw Executables and Labels OnlyDownload PDF

12 Feb 2018 (modified: 04 Jun 2018)ICLR 2018 Workshop SubmissionReaders: Everyone
  • Keywords: deep learning, convolution, malware detection, windows executable files, portable executable, end-to-end, representation learning
  • TL;DR: We learn a deep convolutional malware classifier on 20 million of Windows EXE files represented as raw sequences of bytes and obtain results almost on par with an Avasts ML system based on human engineered features.
  • Abstract: We propose and evaluate a simple convolutional deep neural network architecture detecting malicious \emph{Portable Executables} (Windows executable files) by learning from their raw sequences of bytes and labels only, that is, without any domain-specific feature extraction nor preprocessing. On a dataset of 20 million \emph{unpacked} half megabyte Portable Executables, such end-to-end approach achieves performance almost on par with the traditional machine learning pipeline based on handcrafted features of Avast.
7 Replies

Loading