NetBench: A LARGE-SCALE AND COMPREHENSIVE NETWORK TRAFFIC BENCHMARK DATASET FOR FOUNDATION MODELS
Abstract: In computer networking, network traffic refers to the amount of data transmitted in the form of
packets between internetworked computers or Cyber-Physical Systems. Monitoring and analyzing
network traffic is crucial for ensuring the performance, security, and reliability of a network. However,
a significant challenge in network traffic analysis is to process diverse data packets including both
ciphertext and plaintext. While many methods have been adopted to analyze network traffic, they
often rely on different datasets for performance evaluation. This inconsistency results in substantial
manual data processing efforts and unfair comparisons. Moreover, some data processing methods may
cause data leakage due to improper separation of training and testing data. To address these issues, we
introduce the NetBench, a large-scale and comprehensive benchmark dataset for assessing machine
learning models, especially foundation models, in both network traffic classification and generation
tasks. NetBench is built upon seven publicly available datasets and encompasses a broad spectrum
of 20 tasks, including 15 classification tasks and 5 generation tasks. Furthermore, we evaluate eight
State-Of-The-Art (SOTA) classification models (including two foundation models) and two generative
models using our benchmark. The results show that foundation models significantly outperform
the traditional deep learning methods in traffic classification. We believe NetBench will facilitate
fair comparisons among various approaches and advance the development of foundation models for
network traffic. Our benchmark is available at https://github.com/WM-JayLab/NetBench.
Loading