Differences in Android Behavior Between Real Device and Emulator: A Malware Detection Perspective

Alejandro Guerra-Manzanares, Hayretdin Bahsi, Sven Nõmm

Published: 2019, Last Modified: 12 May 2023IoTSMS 2019Readers: Everyone

Abstract: Behavioral data extracted from emulators or real devices, such as system calls, are utilized in research studies where machine learning models have been employed for mobile malware detection. However, these studies do not explore whether the selection of data source may have an impact on the performance of the models, assuming that both data sources generate similar outputs. We provide a comparative analysis of the data sets obtained from both sources by using statistical techniques, inducing learning models and demonstrating the impact of data source selection on detection models' performance. Our study shows that emulators generate more distinguishable data than real devices, meaning that designers of detection models should pay attention to the data sources utilized in the various steps of the machine learning workflow.

0 Replies