LLM-Enhanced Semantic Analysis for Insider Threat Detection in Enterprise Communication Logs

LLM-Enhanced Semantic Analysis for Insider Threat Detection in Enterprise Communication Logs

Agents4Science 2025 Conference Submission80 Authors

04 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Insider Threat Detection, Large Language Models (LLM), Semantic Analysis, Anomaly Detection, Enterprise Security

Abstract: Insider threats are difficult to detect because malicious or negligent actions occur under valid credentials and blend into ordinary workflows. We present a two-stage pipeline that mirrors SOC practice: Stage–1 performs scalable behavioral anomaly filtering on engineered features (continuous metrics, interpretable binary flags, and weak psychometric priors), producing a hybrid risk score; Stage–2 applies an LLM only to the top-risk subset to generate concise SOC-style narratives that surface intent. Using the full CERT v6 email corpus (∼2.63M messages,∼1k users), we show that engineered features capture strong separations between suspicious and background traffic, total-risk scores yield solid baselines, and semantic narratives improve analyst coverage while keeping cost practical (about 100×reduction in LLM load).

Submission Number: 80

Loading