BehaMiner: System Behavior Mining for Audit Log Based on Graph Learning

Published: 2024, Last Modified: 09 Feb 2026WASA (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Endpoint Monitoring solutions record system-level events executed by processes as audit logs to support attack investigations. Low-level system logs are challenging for humans to comprehend and thereby need to be transformed into high-level user behavior patterns. However, user behavior frequently intermix with other massive system activities and the system entities intertwined in the audit logs. Moreover, behavior patterns vary across the operating system, posing challenges for analyzers to accurately identify all behavior patterns. Existing rule-based methods require expert knowledge and entail high labor costs. While some learning-based approaches are restricted by operation systems. To tackle these problems, we propose an automatic system behavior pattern mining model called BehaMiner. First, BehaMiner collected multiple behavior samples each of which contains audit logs with the execution of a specific behavior in a time window. Then, for each behavioral sample, a system behavior provenance graph is constructed based on the attribute and structural information. Third, the behavior provenance graphs are put into the graph neural networks (GNNs) model to capture the crucial structure of the provenance graph by attention mechanisms. Finally, BehaMiner partitions the behavioral samples into different subgraphs based on the process root node. If the subgraph masked from the sample shows a higher decrease rate in the predicted value probability of the GNNs model, it is considered a behavior pattern. The performance of mining the crucial subgraph can be evaluated by the indicator Probability Decrease Rate (PDR). Experimental results on real-world datasets demonstrate that the average F1 score of BehaMiner for behavior graph learning can achieve 91.33%. Moreover, BehaMiner can mine behavioral patterns and the PDR is 12% higher than the other 7 baselines.
Loading