TrafficBT: Advancing Pre-trained Language Models for Network Traffic  Classification with Multimodal Traffic Representations

Minxin Wang; Ziling Wei; Yi Liu; Jinshu Su; Shuhui Chen; Zhengpeng Li; Biying Wang

TrafficBT: Advancing Pre-trained Language Models for Network Traffic Classification with Multimodal Traffic Representations

Minxin Wang, Ziling Wei, Yi Liu, Jinshu Su, Shuhui Chen, Zhengpeng Li, Biying Wang

18 Sept 2025 (modified: 23 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Network traffic clssification, pre-trained language models, multimodal representation learning, semantic, data augmentation

TL;DR: This paper introduces TrafficBT, a novel framework that achieves state-of-the-art network traffic classification by fusing payload semantics from a pre-trained BERT model with spatio-temporal features captured by a dedicated Transformer architecture.

Abstract: Advances in pre-training and large language models have led to the widespread adoption of pre-trained models for network traffic classification, enhancing service quality, security, and stability. However, most existing pre-trained methods focus solely on payload semantics, neglect temporal dependencies between packets, and rely on single-dimensional static feature learning. This limitation reduces their robustness and generalization capabilities in dynamic and heterogeneous network environments. To address these challenges, we propose TrafficBT, a universal traffic classification framework combining pre-training with multimodal fine-tuning. It extracts both semantic and spatio-temporal features and uses data augmentation to handle data scarcity and class imbalance. During pre-training, TrafficBT leverages large-scale public and real-world traffic datasets to learn domain-specific semantic representations from payloads. In the fine-tuning stage, it adopts a multimodal learning framework that employs a gating network to fuse BERT with a three-layer Transformer architecture, enabling the model to effectively capture both payload semantics and temporal transmission patterns. Experiments show that TrafficBT achieves F1 scores above 0.99 on most real-world and benchmark datasets and outperforms eight state-of-the-art baselines across eight downstream tasks. Notably, it improves performance by 21% in encrypted proxy website classification, demonstrating strong robustness and generalization.

Supplementary Material: zip

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 10733

Loading