Towards Advanced Unsupervised Representation Learning for Graph-Structured Data

Published: 01 Jan 2024, Last Modified: 25 Aug 2024undefined 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Despite the unprecedented achievement achieved by deep learning on graphs, its success is predominantly tethered to the quality and quantity of labeled datasets. In response to this bottleneck, unsupervised learning is gaining traction as a promising paradigm, which circumvents the dependency on manual labeling. However, in most cases, unsupervised learning methods can only achieve less satisfactory performance due to potential issues, let alone surpass those (semi-)supervised competitors. To this premise, in this thesis, advanced unsupervised representation learning for graph-structured data is studied. Firstly, the use of higher-order topological information for unsupervised representation learning is explored. A fully unsupervised network alignment framework (HTC) is proposed, which repositions the focus of the alignment process from low-order to higher-order topological consistency. On five datasets, HTC consistently outperforms a wide variety of unsupervised and supervised methods with the least or comparable time consumption. Secondly, the role of hard negative samples in the uniformity-tolerance dilemma of graph contrastive learning is explored. We propose a novel contrastive objective with a progressive hard negative masking scheme. The proposed objective is theoretically and empirically demonstrated to be capable of allowing higher local tolerance and stronger contrastive effects, thus leading to higher-quality embedding distributions and considerable performance improvement in downstream node classification tasks. Thirdly, we study the identification of interdependence for cross-view mutual information maximization. We propose IDEAL, a simple yet effective framework, to formulate cross-view interdependence from the innovative perspective of information flow. The effectiveness of IDEAL is validated by extensive empirical evidence. It consistently outperforms state-of-the-art self-supervised methods by considerable margins across seven benchmark datasets with diverse scales and properties. In short, this thesis presents a comprehensive exploration into the realm of unsupervised representation learning for graph-structured data. By pushing the boundaries of what is achievable without manual labelling, this work paves the way for more sophisticated, efficient, and effective graph analysis methodologies in a variety of applications, underscoring the critical role of unsupervised learning in the ongoing evolution of graph-based data science.
Loading