Keywords: Traffic Classification, Pre-training, BERT, FlowletFormer
TL;DR: FlowletFormer is a pretraining model for network traffic analysis that improves classification by segmenting traffic into semantically meaningful units, capturing multi-layer protocol semantics, and enhancing inter-packet learning.
Abstract: Network traffic classification with pre-training has achieved promising results, yet existing methods fail to represent cross-packet context, protocol-aware structure, and flow-level behaviors in traffic. To address these challenges, this paper rethinks traffic representation and proposes Flowlet-based pre-training for network analysis. First, we introduce Flowlet and Field Tokenization that segments traffic into semantically coherent units. Second, we design a Protocol Stack Alignment Embedding Layer that explicitly encodes multi-layer protocol semantics. Third, we develop two pre-training tasks motivated by Flowlet to enhance both intra-packet field understanding and inter-flow behavioral learning. Experimental results show that FlowletFormer significantly outperforms existing methods in classification accuracy, few-shot learning and traffic representation. Moreover, by integrating domain-specific network knowledge, FlowletFormer shows better comprehension of the principles of network transmission (e.g., stateful connections of TCP), providing a more robust and trustworthy framework for traffic analysis.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 809
Loading