Inherent Limits on Topology-Based Link Prediction

Published: 27 Jun 2023, Last Modified: 08 Sept 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Link prediction systems (e.g. recommender systems) typically use graph topology as one of their main sources of information. However, automorphisms and related properties of graphs beget inherent limits in predictability. We calculate hard upper bounds on how well graph topology alone enables link prediction for a wide variety of real-world graphs. We find that in the sparsest of these graphs the upper bounds are surprisingly low, thereby demonstrating that prediction systems on sparse graph data are inherently limited and require information in addition to the graph topology.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We believe that we have made all the requested changes. Notable highlights: * The introduction now reads much more cleanly and the intro figure has been adjusted. * We investigated GCN's source-code, and we ran an actual experiment on GCN to verify that its AP scores were due to under-sampling negative edges. The experiment we ran was different from the one we suggested in our rebuttal, but regardless, it was quite conclusive. * Section 3.1 (now Sections 4.1, 4.1.1 and 4.1.2) has been expanded and clarified to better discuss our assumptions and how link prediction limits arise from edges appearing equivalent to a classifier. * The formulae in what used to be Section 4 (now Section 5) are better-explained and simpler to read. In the latest revision (September 8, 2023), minor clarification was made to the proof in Appendix A.1.
Assigned Action Editor: ~Laurent_Massoulié1
Submission Number: 846