A Fused Gromov-Wasserstein Approach to Subgraph Contrastive Learning

Amadou Siaka SANGARE; Nicolas Dunou; Jhony H. Giraldo; Fragkiskos D. Malliaros

A Fused Gromov-Wasserstein Approach to Subgraph Contrastive Learning

Amadou Siaka SANGARE, Nicolas Dunou, Jhony H. Giraldo, Fragkiskos D. Malliaros

Published: 27 Feb 2025, Last Modified: 27 Feb 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Self-supervised learning has become a key method for training deep learning models when labeled data is scarce or unavailable. While graph machine learning holds great promise across various domains, the design of effective pretext tasks for self-supervised graph representation learning remains challenging. Contrastive learning, a popular approach in graph self-supervised learning, leverages positive and negative pairs to compute a contrastive loss function. However, current graph contrastive learning methods often struggle to fully use structural patterns and node similarities. To address these issues, we present a new method called Fused Gromov-Wasserstein Subgraph Contrastive Learning (FOSSIL). Our method integrates node-level and subgraph-level contrastive learning, seamlessly combining a standard node-level contrastive loss with the Fused Gromov-Wasserstein distance. This combination helps our method capture both node features and graph structure together. Importantly, our approach works well with both homophilic and heterophilic graphs and can dynamically create views for generating positive and negative pairs. Through extensive experiments on benchmark graph datasets, we show that FOSSIL outperforms or achieves competitive performance compared to current state-of-the-art methods.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We sincerely appreciate the reviewers’ and the Action Editor’s time and thoughtful feedback. Below, we address the remaining concerns raised by Reviewer NhfX: - Our generator (the GAT model) is designed to avoid the need for manually selecting an augmentation strategy, which is a common challenge in SSL for graphs. Since the generator's purpose is to output a perturbed graph for contrastive learning, using a message-passing network is a natural choice. We have conducted a comprehensive analysis of different graph perturbation strategies and generator architectures in Table 5, where the approach used in FOSSIL achieves the best performance. This provides empirical justification for our design choice. - Our encoder is equivariant; the sampling process introduces a break in equivariance. We have explicitly clarified this point in Section 4.5, ensuring that the discussion accurately reflects this important distinction. - We have now included results on both OGBN-Arxiv and OGBN-Proteins in a newly added Appendix B. Some methods from Table 2 resulted in memory overflow on both datasets with our available resources, so we omitted them from Table 7. We are sorry for the delay, we encountered issues when trying to evaluate our method on OGBN-Proteins. Notably we noticed that it is not possible to do a forward pass on OGBN-Proteins with the 40GB GPU memory we have. Hence we couldn’t directly evaluate our method or any other baseline on the original graph. Our simplest solution was to extract a subgraph with a similar size as OGBN-Arxiv. The results are in Table 2 of the manuscript. The GAT, by definition, uses the adjacency matrix. Hence as we detailed it with equations in our previous response, “g being a GAT at the feature level applied on f(x)” means to us to compute the graph attention weights with the original features and then apply these attention weights on the embeddings f(x). This corresponds to the model we have indicated as F-GAT-E in Table 5. We may not have emphasized it enough, but our goal is to create subgraph augmentations. Only using the identity matrix conceptually treats each subgraph as a set of disconnected nodes. By using the adjacency matrix, we also exploit the connectivity of subgraphs. We don’t see another way to do so without a message-passing. Simply put, an attention between nodes is by definition a matrix that reflects the pairwise interaction of nodes. Further applying this matrix on nodes is by essence a message-passing step. We don’t see another way to perform augmentations without severely changing our method. But we understand your point and will consider in future work an alternative that avoids an additional message-passing.

Code: https://github.com/sangaram/FOSSIL

Assigned Action Editor: ~Moshe_Eliasof1

Submission Number: 3177

Loading