Always be Pre-Training: Representation Learning for Network Intrusion Detection with GNNs

Zhengyao Gu, Diego Troy Lopez, Lilas Alrahis, Ozgur Sinanoglu

Published: 01 Jan 2024, Last Modified: 20 May 2025ISQED 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Graph neural network-based network intrusion detection systems have recently demonstrated state-of-the-art performance on benchmark datasets. Nevertheless, these methods suffer from a reliance on target encoding for data pre-processing, limiting widespread adoption due to the associated need for annotated labels—a cost-prohibitive requirement. In this work, we propose a solution involving in-context pre-training and the utilization of dense representations for categorical features to jointly overcome the label-dependency limitation. Our approach exhibits remarkable data efficiency, achieving over 98% of the performance of the supervised state-of-the-art with less than 4% labeled data on the NF-UQ-NIDS-V2 dataset.