Rethinking Traffic Representation: Pre-training Model with Flowlets for Traffic Classification

Liming Liu; Ruoyu Li; Qing Li; Meijia Hou; Yong Jiang; Mingwei Xu

Rethinking Traffic Representation: Pre-training Model with Flowlets for Traffic Classification

Liming Liu, Ruoyu Li, Qing Li, Meijia Hou, Yong Jiang, Mingwei Xu

02 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Traffic Classification, Pre-training, BERT, FlowletFormer

TL;DR: FlowletFormer is a pretraining model for network traffic analysis that improves classification by segmenting traffic into semantically meaningful units, capturing multi-layer protocol semantics, and enhancing inter-packet learning.

Abstract: Network traffic classification with pre-training has achieved promising results, yet existing methods fail to represent cross-packet context, protocol-aware structure, and flow-level behaviors in traffic. To address these challenges, this paper rethinks traffic representation and proposes Flowlet-based pre-training for network analysis. First, we introduce Flowlet and Field Tokenization that segments traffic into semantically coherent units. Second, we design a Protocol Stack Alignment Embedding Layer that explicitly encodes multi-layer protocol semantics. Third, we develop two pre-training tasks motivated by Flowlet to enhance both intra-packet field understanding and inter-flow behavioral learning. Experimental results show that FlowletFormer significantly outperforms existing methods in classification accuracy, few-shot learning and traffic representation. Moreover, by integrating domain-specific network knowledge, FlowletFormer shows better comprehension of the principles of network transmission (e.g., stateful connections of TCP), providing a more robust and trustworthy framework for traffic analysis.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 809

Loading