Keywords: Hypergraph Learning, Attention, knowledge distillation, Co-Distillation
TL;DR: An asymmetric contrastive scheme where only the teacher processes both clean and perturbed views, fusing them via a learnable gating mechanism to produce high‐quality distillation targets.
Abstract: Many real-world systems involve complex many-to-many relationships naturally represented as hypergraphs, from social networks to molecular interactions. While hypergraph neural networks (HGNNs) have shown promise, existing attention mechanisms fail to handle hypergraph-specific asymmetries between node-to-node, node-to-hyperedge, and hyperedge-to-node interactions, leading to suboptimal structural encoding. We introduce \textbf{CuCoDistill}, a novel framework that challenges fundamental assumptions in knowledge distillation by demonstrating that student models can systematically outperform their teachers through hypergraph-aware adaptive attention with provable spectral guarantees. Our approach features: (1) set-aware attention fusion that handles variable-sized hyperedge sets with approximation error bounds of $\epsilon\sqrt{|\mathcal{V}|}\max_i|\mathcal{E}_i|$; (2) co-evolutionary unified architecture where teacher and student jointly discover structural patterns in a single forward pass; and (3) theoretically-grounded curriculum distillation based on hypergraph spectral properties. We prove that when student's constrained attention aligns with the hypergraph's intrinsic spectral dimension, superior generalization emerges through beneficial regularization. Extensive experiments across nine benchmarks show our students achieve up to 1.8\% higher accuracy than teachers while delivering 6.25× inference speedup and 10× memory reduction, consistently outperforming state-of-the-art methods and establishing new efficiency-performance frontiers for hypergraph learning.
Supplementary Material: zip
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 8334
Loading