Normalizing Audit Logs Using Large Language Models

KDD 2024 Workshop KiL Submission10 Authors

30 May 2024 (modified: 29 Jun 2024)Submitted to KiL 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, zero shot learning, open cybersecurity schema framework, velocity template language, classification, constrained LLM generation, LLM output evaluation, prompt engineering, log management, log normalization
TL;DR: Normalization of audit logs from various ISVs, by generating VTL templates for mapping input events from ISVs to OCSF format, using zero shot learning with Large Language Models.
Abstract: We present a novel approach for normalizing audit logs from various Independent Software Vendor (ISV)s by generating Velocity Template Language (VTL) templates for mapping input events from ISVs to Open Cybersecurity Schema Framework (OCSF) format using zero shot learning with Large Language Model (LLM)s. In this approach, we use hierarchical classification to classify events from an ISV into appropriate OCSF event categories, event classes and event activities. Then we use the JSON schema for the generated OCSF event classes to generate VTL templates, which map the fields in the input events to the fields in the OCSF format. We use the ISV event name and description for the event classification task and the event json schema and a collection of sample event logs for the VTL template generation task. We evaluate the results of the two tasks using human generated event mappings and VTL templates for various ISVs as ground truth respectively. We also use a different LLM for evaluation of the outputs of the two tasks, by generating confidence scores and qualitative assessment for both tasks using an evaluation prompt. If the confidence score is lower than a preset threshold, the generated qualitative feedback is used to improve the LLM output for the VTL template generation task. This work helps improve the error prone and time consuming audit log normalization process by doubling the event classification accuracy obtained through human annotators, and reducing the VTL template generation process for new ISVs by from 2 days to half a day.
Submission Number: 10
Loading