Abstract: Network intrusion detection (NID) aims to identify unusual network traffic patterns (distribution shifts) that require NID systems to evolve continuously. While prior art emphasizes fully supervised annotated data-intensive continual learning methods for NID, semi-supervised continual learning (SSCL) methods require only limited annotated data. However, the inherent class imbalance (CI) in network traffic can significantly impact the performance of SSCL approaches. Previous approaches to tackle CI issues require storing a subset of labeled training samples from all past tasks in the memory for an extended duration, potentially raising privacy concerns. The proposed Semisupervised Privacy-preserving Intrusion detection with Drift-aware continual LEaRning (SPIDER) is a novel method that combines gradient projection memory (GPM) with SSCL to handle CI effectively without the requirement to store labeled samples from all of the previous tasks. We assess SPIDER’s performance against baselines on six intrusion detection benchmarks formed over a short period and the Anoshift benchmark spanning ten years, which includes natural distribution shifts. Additionally, we validate our approach on standard continual learning image classification benchmarks known for frequent distribution shifts compared to NID benchmarks. SPIDER achieves comparable performance to fully supervised and semisupervised baseline methods, while requiring a maximum of 20% annotated data and reducing the total training time by 2X.
Loading