LcaLLM: A Deep Latent Cross-Attention Framework for EDR Log Analysis with Large Language Model

ACL ARR 2025 May Submission1973 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Endpoint Detection and Response (EDR) systems play a critical role in safeguarding enterprises against sophisticated threats, particularly advanced persistent threats (APTs). However, detecting abnormal behaviors within long, complex and interdependent event sequences from EDR system log that remains a major challenge. Addressing these challenges, this paper introduces LcaLLM, an novel EDR log analytical framework leveraging the advanced capabilities of Large Language Models (LLMs) in understanding and representing extensive sequential data. LcaLLM proposes three distinguished contributions: (1) a Latent Cross-Attention (LCA) model architecture meticulously designed to enhance the representation of long EDR event sequence, (2) an Event Semantic Alignment mechanism that enriches structured EDR logs with nuanced natural language expressions, aligned with the input of language model for an improved interpretability, and (3) a Multi-Objective Loss Aggregation training approach that enables the model to learn deep complex relationships among EDR events. We also release EDR47K-40F-v1.0, a large-scale EDR dataset comprising over 47K event records, covering 40 threat families and normal activities. The LcaLLM framework not only outperforms traditional methods but also sets new benchmarks in threat detection accuracy and classification precision, achieving 98.32% accuracy in threat identification and a 96.73% success rate in classifying threats across 40 families. We further analyze the impact of latent size, layer depth, pooling strategies and robustness to dynamics. We open-source the dataset and code at: https://github.com/victorzhz19995/EDR_LcaLLM.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: security/privacy, Benchmarking, LLM Efficiency, Information Retrieval and Text Mining
Contribution Types: Data resources, Data analysis
Languages Studied: English
Submission Number: 1973
Loading