A Data-Driven Solution for Improving Transferability of Traffic Flow Feature Selection

Pegah Golchin, Nima Rafiee, Ralf Kundel

Published: 2024, Last Modified: 11 May 2025IFIP Networking 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The expansion of Internet connectivity has increased cyber threats in computer networks. Machine Learning (ML)-based Intrusion Detection Systems (IDS) have emerged as a promising candidate, leveraging ML models to analyze network traffic features and differentiate between malicious and benign flows. However, before using ML models, a crucial preprocessing step called feature selection is performed in ML-based IDS to identify the most relevant features that can enhance detection accuracy, streamline ML models, and reduce computational complexity. The selected features need to be transferable across diverse network traffic datasets, which is challenging due to variations in attack types, network architectures, and complex relationships among their flow features. In this work, we present a Data-Driven Ensemble Feature Selection (DD-EFS) to improve the transferability of the selected features across various network traffic datasets. Our results demonstrate an average increase in detection performance of up to 6.8%, 5.1%, and 4.3% across two distinct, previously unseen network traffic datasets for the Random Forest, Logistic Regression, and Multi-Layer Perceptron models, respectively.