TrafficAudio: Audio Representation for Lightweight Encrypted Traffic Classification in IoT

Yilu Chen, Ye Wang, Ruonan Li, Yujia Xiao, Lichen Liu, Jinlong Li, Yan Jia, Zhaoquan Gu

Published: 2026, Last Modified: 12 Mar 2026IEEE Trans. Netw. Serv. Manag. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Encrypted traffic classification has become a crucial task for network management and security with the widespread adoption of encrypted protocols across the Internet and the Internet of Things. However, existing methods often rely on discrete representations and complex models, which leads to incomplete feature extraction, limited fine-grained classification accuracy, and high computational costs. To this end, we propose TrafficAudio, a novel encrypted traffic classification method based on audio representation. TrafficAudio comprises three modules: audio representation generation (ARG), audio feature extraction (AFE), and spatiotemporal traffic classification (STC). Specifically, the ARG module first represents raw network traffic as audio to preserve temporal continuity of traffic. Then, the audio is processed by the AFE module to compute low-dimensional Mel-frequency cepstral coefficients (MFCC), encoding both temporal and spectral characteristics. Finally, spatiotemporal features are extracted from MFCC through a parallel architecture of one-dimensional convolutional neural network and bidirectional gated recurrent unit layers, enabling fine-grained traffic classification. Experiments on five public datasets across six classification tasks demonstrate that TrafficAudio consistently outperforms ten state-of-the-art baselines, achieving accuracies of 99.74%, 98.40%, 99.76%, 99.25%, 99.77%, and 99.74%. Furthermore, TrafficAudio significantly reduces computational complexity, achieving reductions of 86.88% in floating-point operations and 43.15% of model parameters over the best-performing baseline.

External IDs:dblp:journals/tnsm/ChenWLXLLJG26