Cybersecurity Named Entity Recognition Using Multi-Modal Ensemble Learning

Feng Yi; Bo Jiang; Lu Wang; Jianjun Wu

Cybersecurity Named Entity Recognition Using Multi-Modal Ensemble Learning

Feng Yi, Bo Jiang, Lu Wang, Jianjun Wu

Published: 01 Jan 2020, Last Modified: 08 Feb 2025IEEE Access 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Cybersecurity named entity recognition is an important part of threat information extraction from large-scale unstructured text collection in many cybersecurity applications. Most existing security entity recognition studies and systems use regular matching strategy or machine learning algorithms. Due to the peculiarity and complexity of security named entity, these models ignore the characteristic of security data and the correlation of entities. Therefore, through the in-depth study of security entity characteristic, we propose a novel security named entity recognition model based on regular expressions and known-entity dictionary as well as conditional random fields (CRF) combined with four feature templates. This model is named RDF-CRF. The rule-based expressions can match security entities with good accuracy in simpler situations, the known-entity dictionary can extract common and specific security entity, and the CRF-based extractor leverages the identified entities by rule-based and dictionary-based extractors to further improve the recognition performance. In order to demonstrate the effectiveness of our proposed model, extensive experiments are performed on a security text dataset collected from public security webs. The experimental results show that can achieve better performance than state-of-the-art methods.

Loading