Complete-Tree Space Favors Data-Efficient Link Prediction

Chi Gao; Lukai Li; Yancheng Zhou; Shangqi Guo

Complete-Tree Space Favors Data-Efficient Link Prediction

Chi Gao, Lukai Li, Yancheng Zhou, Shangqi Guo

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We develop the complete-tree space that favors data-efficient link prediction on real-world networks due to its intrinsic hierarchical permutation symmetries.

Abstract: Link prediction is a fundamental problem for network-structured data. However, the prevalent research paradigm tends to assume abundant observed links, overlooking more challenging scenarios with a scarcity of observed links, which results in insufficient sample sizes. In real-world networks, hierarchical modularity, characterized by structured and nested connections, remains robust even with sparsely observed links. To address the challenge of limited link samples, we propose leveraging hierarchical modularity as a prior structure. We introduce complete-tree (CT) space, a discrete metric space with latent complete-tree structures, to formalize hierarchical modularity with an emphasis on its hierarchical permutation symmetry. Utilizing the group theory to quantize and compare permutation symmetries of different spaces, we prove that the CT space provides a significantly lower bound on sample complexity than the commonly used Euclidean space. We develop leaf matching, a data-efficient network embedding that maps nodes onto the CT space and conducts discrete optimization by reducing it to decentralized search. Experiments verify the data efficiency of CT space over other spaces. Moreover, leaf matching outperforms the state-of-the-art graph transformer in data-scarce scenarios while exhibiting excellent scalability. The code is available at: **https://github.com/KevinGao7/LeafMatching**.

Lay Summary: Connections are everywhere. While predicting missing connections is a well-studied fundamental problem, we question whether such predictions are feasible when only a minority of connections is known. Recognizing that connections typically form a hierarchical structure, we conclude that it is indeed possible. Specifically, we develop the complete-tree space, which incorporates latent hierarchical structures. Our theoretical and experimental results demonstrate that once we map the known connections to this space, the prediction becomes straightforward. For example, a naively constructed network embedding in the complete-tree space can even outperform the state-of-the-art neural network in protein interaction prediction when only 2% connections are known. Given that hierarchy is a common property of real-world data, we foresee potential extensions of our complete-tree space to a wide range of domains, including natural language processing, visual modeling, hierarchical planning, or even the functional understanding of hierarchically arranged grid cells in the brain.

Link To Code: https://github.com/KevinGao7/LeafMatching

Primary Area: General Machine Learning->Representation Learning

Keywords: Complete-Tree Space, Network Embedding, Data-Efficient Learning, Link Prediction

Submission Number: 3067

Loading