HSelKD: Selective Knowledge Distillation for Hypergraphs using Optimal Transport

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Knowledge distillation, Hypergraph, Optimal transport, Representation learning, node classification
Abstract: Hypergraph Neural Networks (HGNNs) excel at modeling high-order dependencies through hyperedges, but their heavy inference cost limits deployment in latency-sensitive industrial scenarios. Knowledge Distillation (KD) offers a promising way to combine the expressiveness of graph-based models with the efficiency of lightweight Multi-Layer Perceptrons (MLPs). However, existing KD methods typically transfer the full output distribution of the teacher, overlooking the practical setting where only a subset of knowledge is necessary or beneficial. To address this, we propose HSelKD, a selective KD framework that transfers task-relevant knowledge from an HGNN teacher to a lightweight MLP student. HSelKD leverages Inverse Optimal Transport to distill the most informative parts of the teacher’s knowledge in a capacity-aware manner. We further introduce two principled variants: (1) Task-Aware Distillation, which specializes the student on task-relevant labels, and (2) Reject-Aware Distillation, which equips the student with the ability to abstain from uncertain or out-of-scope predictions. Extensive experiments on hypergraph and graph benchmarks show that HSelKD consistently outperforms lightweight baselines, matches the accuracy of structure-aware teachers, and delivers faster inference by up to 53× with lower training cost and computational overhead. These results establish HSelKD as a practical and scalable solution for real-world, latency-constrained deployments.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 12150
Loading