Abstract: The use of host intrusion detection systems shows promising results in detecting APT campaigns due to the use of systems logs as source data to get more information about system environment. However, dealing with the increase of logs in time while tracking the execution context is a challenge for security analysts. Therefore, this work presents backbone extraction as a crucial preprocessing step, filtering out irrelevant logs. As the logs are modeled as provenance graphs, we discard spurious edges to detect residuals with distinctive node and edge distributions that indicate security threats. By applying our methodology to state-of-the-art benchmark datasets, we observed an increase in the performance of one-class classifiers by up to 62% on F1-score and 48% on recall in the Streamspot dataset and by up to 40% on F1-score and 33% on recall in the DARPA3 THEIA dataset. Moreover, our results indicate mitigation of the dependency explosion problem and underscore the ability of our methodology to improve the detection landscape by shrinking graph sizes without losing essential aspects to characterize attacks.
Loading