Abstract: We propose and evaluate a simple convolutional deep neural network architecture detecting malicious \emph{Portable Executables} (Windows executable files) by learning from their raw sequences of bytes and labels only, that is, without any domain-specific feature extraction nor preprocessing. On a dataset of 20 million \emph{unpacked} half megabyte Portable Executables, such end-to-end approach achieves performance almost on par with the traditional machine learning pipeline based on handcrafted features of Avast.
TL;DR: We learn a deep convolutional malware classifier on 20 million of Windows EXE files represented as raw sequences of bytes and obtain results almost on par with an Avasts ML system based on human engineered features.
Keywords: deep learning, convolution, malware detection, windows executable files, portable executable, end-to-end, representation learning
7 Replies
Loading