ConViTML: A Convolutional Vision Transformer-Based Meta-Learning Framework for Real-Time Edge Network Traffic Classification

Lu Yang, Songtao Guo, Defang Liu, Yue Zeng, Xianlong Jiao, Yuhao Zhou

Published: 01 Jan 2024, Last Modified: 12 Feb 2025IEEE Trans. Netw. Serv. Manag. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Traditional traffic classification methods struggle to identify emerging network traffic due to the need for model retraining, which hampers the real-time response of deployed edge devices. Moreover, emerging network traffic samples are often scarce, and traditional methods often treat a session as a single image, thereby overlooking essential structural features. These factors can result in poor generalization ability of the trained model. To overcome these challenges, we propose ConViTML (Convolutional Vision Transformer-based Meta-Learning), a real-time end-to-end network traffic classification framework that employs meta-learning to avoid model retraining. We propose a novel feature extraction network, Convolutional Visual Transformer (ConViT), merging Convolutional Neural Network (CNN) and Visual Transformer (ViT). ConViT can directly extract low-dimensional discriminative features containing basic and structural features of the session, which is vital for improving detection accuracy and accelerating convergence in a data-scarce environment. Furthermore, we employ a Packet-based Relation Network (PRN) to analyze the matching degree of support samples and query samples. Therefore, accurate classification in novel traffic identification tasks can be achieved with just a few labeled samples, eliminating extensive data collection and labeling operations. Finally, we replace various feature extractors and compare our approach with the classic meta-learning framework Relation Network (RelationNet). Extensive experimental results demonstrate that ConViTML outperforms others with various performance indicators.