CIA-EBE: Class Imbalance-Aware Event-Based Embedding for SOC Log Screening

Published: 01 Jan 2024, Last Modified: 07 May 2025IEEE Big Data 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Security Operations Centers (SOCs) face significant challenges in processing large volumes of event logs. Traditional log screening methods frequently suffer from high false positive rates (FPR) and struggle to identify subtle, evolving threats such as reconnaissance attacks, which often precede more severe intrusions. This paper introduces a novel Class Imbalance-Aware Event-Based Embedding (CIA-EBE) approach designed to enhance SOC log screening by transforming individual security events into dense vector representations while emphasizing minority-class events. We evaluate the effectiveness of CIA-EBE using a dataset derived from Zeek logs and compare its performance against conventional embedding techniques like Word2Vec and Doc2Vec across multiple classifiers. CIA-EBE achieved 0% FPR and 100% recall with the Support Vector Machine classifier using stratified 5-fold cross-validation. Visualization techniques such as t-distributed Stochastic Neighbor Embedding and hierarchical clustering validated the separation between attack and benign events, demonstrating the robustness of CIA-EBE. This study illustrates the potential of AI-driven log screening approaches to enhance the accuracy and efficiency of SOC operations, equipping analysts with improved tools for early cyber threat detection.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview