CTIKG: LLM-Powered Knowledge Graph Construction from Cyber Threat Intelligence

Liangyi Huang; Xusheng Xiao

CTIKG: LLM-Powered Knowledge Graph Construction from Cyber Threat Intelligence

Liangyi Huang, Xusheng Xiao

Published: 10 Jul 2024, Last Modified: 26 Aug 2024COLMEveryoneRevisionsBibTeXCC BY 4.0

Research Area: LMs with tools and code, LMs on diverse modalities and novel applications

Keywords: Large Language Model, Machine Learning and Security, Knowledge Graph, Information Extraction

TL;DR: Collaboration of multiple LLM Agents for knowledge extraction from articles in the computer security field

Abstract: To gain visibility into evolving threat landscape, knowledge of cyber threats has been aggressively collected across organizations and is often shared through Cyber Threat Intelligence (CTI). While knowledge of CTI can be shared via structured format such as Indicators of Compromise (IOC), articles in technical blogs and posts in forums (referred to as CTI articles) provide more comprehensive descriptions of the observed real-world at- tacks. However, existing works can only analyze standard texts from mainstream cyber threat knowledge bases such as CVE and NVD, and lack of the capability to link multiple CTI articles to uncover the relationships among security-related entities such as vulnerabilities. In this paper, we propose a novel approach, CTIKG, that utilizes prompt engineering to efficiently build a security-oriented knowledge graph from CTI articles based on LLMs. To mitigate the challenges of LLMs in randomness, hallucinations and tokens limitation, CTIKG divides an article into segments and employs multiple LLM agents with dual memory design to (1) process each text segment separately and (2) summarize the results of the text segments to generate more accurate results. We evaluate CTIKG on two representative benchmarks built from real world CTI articles, and the results show that CTIKG achieves 86.88% precision in building security-oriented knowledge graphs, achieving at least 30% improvements over the state-of-the-art techniques. We also demonstrate that the retry mechanism makes open source language models outperform GPT4 for building knowledge graphs.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 74

Loading