Keywords: Insider Threat Detection, Large Language Models (LLM), Semantic Analysis, Anomaly Detection, Enterprise Security
Abstract: Insider threats are difficult to detect because malicious or negligent actions occur under valid credentials and blend into ordinary workflows. We present a two-stage pipeline that mirrors SOC practice: Stage–1 performs scalable behavioral anomaly filtering on engineered features (continuous metrics, interpretable binary flags, and weak psychometric priors), producing a hybrid risk score; Stage–2 applies an LLM only to the top-risk subset to generate concise SOC-style narratives that surface intent. Using the full CERT v6 email corpus (∼2.63M messages,∼1k users), we show that engineered features capture strong separations between suspicious and background traffic, total-risk scores yield solid baselines, and semantic narratives improve analyst coverage while keeping cost practical (about 100×reduction in LLM load).
Submission Number: 80
Loading