TL;DR: We show that common link prediction benchmarks are biased by node degree, leading to misleading performance favoring methods which overfit to node degree. We propose a correction and highlight its advantages.
Abstract: Link prediction---a task of distinguishing actual hidden edges from random unconnected node pairs---is one of the quintessential tasks in graph machine learning. Despite being widely accepted as a universal benchmark and a downstream task for representation learning, the link prediction benchmark's validity has rarely been questioned. Here, we show that the common edge sampling procedure in the link prediction task has an implicit bias toward high-degree nodes. This produces a highly skewed evaluation that favors methods overly dependent on node degree. In fact a ``null'' link prediction method based solely on node degree can yield nearly optimal performance in this setting. We propose a degree-corrected link prediction benchmark that offers a more reasonable assessment and better aligns with the performance on the recommendation task. Finally, we demonstrate that the degree-corrected benchmark can more effectively train graph machine-learning models by reducing overfitting to node degrees and facilitating the learning of relevant structures in graphs.
Lay Summary: From suggesting new friends to recommending products, machine learning models are increasingly used to predict connections in networks. But the way we test these systems has a serious flaw: it favors popular items, such as people who already have many connections. In network terms, these are the "high-degree" nodes, i.e., those linked to many others. As a result, even simple models that just recommend these well-connected nodes can appear to perform well. We found that this flaw comes from how test data is built: popular nodes are sampled more often as examples of true connections simply because they have more links, while non-connections are sampled randomly, regardless of how popular the nodes are. This gives models a shortcut: they can classify connections and non-connections within the test data based on popularity alone. So much so that predictions sorely based on popularity can achieve nearly optimal performance without actually learning meaningful structure. To address this, we introduce a new testing method that balances the visibility of popular and less-popular nodes. This prevents models from relying only on popularity and encourages learning real patterns. Our approach can improves the usefulness of machine learning systems in recommendations, scientific discovery, and other applications where identifying genuine connections is critical.
Link To Code: https://github.com/skojaku/degree-corrected-link-prediction-benchmark
Primary Area: Deep Learning->Graph Neural Networks
Keywords: link prediction, benchmark, bias, degree heterogeneity
Submission Number: 11330
Loading