Prediction Is NOT Classification: On Formulation and Evaluation of Hyperedge Prediction

Published: 01 Jan 2024, Last Modified: 24 Jul 2025ICDM (Workshops) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: A hypergraph, which consists of nodes and hyperedges (i.e., subsets of nodes), naturally represents group relations, such as recipes consisting of ingredients, outfits consisting of fashion items, and collaborations among researchers. The hyperedge prediction (HP) problem, which involves predicting future or missing hyperedges, has gained attention for applications, including recipe development, outfit recommendation, and collaborator search. However, due to the vast number of hyperedge candidates, which is about 2n for n nodes, it is extremely challenging to identify the most promising ones among the entire candidate set. Thus, the problem is commonly reformulated as the classification of the real hyperedges and artificially generated ones in order to simplify both training and evaluation.Our work offers three significant contributions regarding HP. First, we present an improved formulation that is semantically aligned, computationally feasible, and better suited for various applications. Second, we make striking observations based on this improved formulation: (a) the performance in the classification formulation does not accurately reflect HP performance and is often negatively correlated, and (b) simple rule-based methods outperform advanced deep-learning approaches. Lastly, we present MHP, a novel HP method that utilizes masking-based training and outperforms all competing HP methods by up to 40%.
Loading