Context-Augmented Key Phrase Extraction from Short Texts for Cyber Threat Intelligence Tasks

Avishek Bose, Huichen Yang, Marissa Shivers, Ahat Orazgeldiyev, William H. Hsu

Published: 2023, Last Modified: 13 Nov 2024ISI 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, we address contextual limitations of current deep learning-based and heuristic key phrase extraction tools as applied to the domain of cybersecurity. To address these limitations, we develop a hybrid system that augments state-of-the-art (SOTA) transformers for the task of key phrase sequence labeling, using a novel set of part-of-speech (POS) and role-aware tagging rules to generate fine-grained tag sequences from short text corpora. Next, we fine-tune multiple SOTA deep learning (DL) language model (LM) architectures to these transformed sequences. We then evaluate the architectures by measuring the outcomes from respective LMs to select the best-performing underlying transformers for extracting cybersecurity key phrases. This new ensemble achieves very significant predictive gains over SOTA baselines on general cybersecurity corpora, such as F1 scores at least 25% higher than hybrid SOTA transformers fine-tuned using baseline tagging rules on the generic corpus, with a much less significant tradeoff (of less than 5% in F1) on a vulnerability-specific corpus.