Keywords: Encrypted Traffic Classification, Hypergraph Learning, Bipartite Graph
Abstract: In the era of pervasive encryption, encrypted traffic classification serves as a fundamental technique, underpinning diverse applications including intrusion detection and network management.
It is commonly approached with deep learning methods that rely on semantic feature extraction or on traffic interaction graphs; however, these approaches suffer from three major limitations: semantic signal obfuscation under strong encryption, which suppresses distinguishable single-flow semantics and undermines accuracy and robustness; inter-flow over-squashing, which constrains the expressivity of interaction graphs and degrades classification performance; and the absence of intra–inter fusion combined with limited scalability, which prevents effective reconciliation of semantic and structural cues and hinders deployment on massive traffic graphs.
To address these challenges, we propose S$^2$-ETR (Semantic-Structure Encrypted Traffic Representation), a novel framework that integrates traffic semantics with communication topology graph. The framework includes a Hyper-Bipartite Graph (HBG), which takes two branches to fuse topology and semantic features. The topology branch models structural relations with an IP–flow bipartite graph, decoupling flows from communication entities to mitigate overfitting. The semantic branch employs a lightweight adapter to capture flow semantics, enhancing cross-domain robustness; meanwhile, it constructs semantic hyperedges via implicit hypergraph learning, propagating global semantic representations without extra information. Finally, a conditional probability–based hierarchical classification strategy is introduced to augment scalability on massive traffic graphs. Furthermore, through a mathematical proof, we demonstrate that HBG reduces long-range dependencies and over-squashing, leading to better efficacy and generalization compared to traditional topology graphs. Experimental results show that S$^2$-ETR consistently achieves state-of-the-art performance across 5 datasets of varying scales, outperforming 15 baselines by 2.4\%–17.1\% on encrypted application classification datasets, and surpassing the best baseline by 9.2\% on the more complex and challenging IoT dataset.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 2434
Loading