Simplifying complex machine learning by linearly separable network embedding spaces

13 Mar 2026 (modified: 20 Mar 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Low-dimensional embeddings are a cornerstone of modelling and analysis of complex networks. However, most of the existing approaches for mining network embedding spaces rely on computationally intensive machine learning systems to facilitate downstream analysis tasks. In contrast, in the field of Natural Language Processing, it was observed that word embedding spaces capture semantic relationships linearly, allowing for information retrieval using simple linear operations on word embedding vectors. Similar linear semantic relationships (i.e., the compositionality of embedding vectors) have also been observed in data embeddings from pre-trained vision-language models. This poses the question of why in some cases the embedding methods lead to a linearly separable embedding space amenable to linear exploitation, while in other cases they do not. Here, we gain fundamental insight into the structure of network data that yield this linearity. We show that the more homophilic the network representation, the more linearly separable the corresponding network embedding space, yielding better downstream analysis results. We demonstrate applicability of our insight on thirteen networks from multiple domains, six multi-label biological networks and seven single-label networks from social, citation, and transportation networks domain. We believe that these fundamental insights into the structure of network data that enable their linear mining and exploitation are the foundation to build upon towards efficient and explainable mining of complex network data.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Jicong_Fan2
Submission Number: 7916
Loading