Abstract: Malware samples infecting IoT (Internet of Things) devices such as web cameras and home routers have spread over the Internet, which are called IoT malware. When an IoT malware sample is captured, analyzing it can be a time-consuming task. Classification is a good solution leading to efficient malware analysis. That is, if a captured sample can be automatically classified into a malware family of already-analyzed samples, their analysis results will be a useful hint for analysis. In this research, we focus on (static) disassembly to extract features from samples used for calculating their similarities for classification. This is because disassembling malware binaries can be faster than, for example, dynamic analysis under which each sample should be run for a few minutes. However, if samples are packed (encrypted and/or compressed), disassembly does not work well. As a first step towards classification, the goal of this paper is to answer two questions: Are most IoT malware samples not packed? and Can disassembly-code based similarity work well for classification? To this end, with experiments using 8,713 in-the-wild IoT malware samples, we conducted entropy analysis and confirmed that most samples were not packed. We then generated similarity matrices based on disassembly code. After that, we visualized the samples with t-SNE (t-Distributed Stochastic Neighbor Embedding) based on the similarity matrices, and we also confirmed that similar samples were closely mapped on a two-dimensional plane and that distinct samples were comparatively, separately mapped. This means that disassembly can work well against IoT malware for classification.
Loading