Anomaly Detection in Cybersecurity Events Through Graph Neural Network and Transformer Based Model: A Case Study with BETH Dataset

Published: 2022, Last Modified: 10 Feb 2025IEEE Big Data 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the increasing prevalence of the internet, detecting malicious behavior is becoming a greater need. This problem can be formulated as an anomaly detection task on provenance data, where attacks are detectable as anomalies in the behavior of the system. While network data is quite prevalent, we focus on system logs and propose a novel approach with two main components. The first is to make use of the graph-like structure of the logs in which processes enact events and generate additional processes, using a graph neural network (GNN) to produce representations of each event which encode information about their neighboring events in an unsupervised manner. The second is to make use of the complex features such as command arguments which vary widely and cannot be used in the presented format as features in typical machine learning algorithms. If these features are instead encoded using transformer models, they can then be used in other algorithms such as a GNN or anomaly detector. These two approaches combined improve anomaly detection results for the BETH dataset by around 8 percent as compared to the manually engineered features alone.
Loading