Abstract: The Wasserstein Weisfeiler-Lehman~(WWL) graph kernel is a popular and efficient approach, utilized in various kernel-dependent machine learning frameworks for practical applications with graph data. It incorporates optimal transport geometry into the Weisfeiler-Lehman graph kernel, to mitigate the information loss inherent in aggregation strategies of graph kernels. While the WWL graph kernel demonstrates superior performance in many applications, it suffers a drawback in its computational complexity, i.e., at least $\mathcal{O}(n_{1} n_{2})$, where $n_{1}, n_{2}$ denote the number of vertices in the input graphs. Consequently, it hinders the practical applicability of the WWL graph kernel, especially in large-scale settings. In this paper, we propose the \emph{Tree Wasserstein Weisfeiler-Lehman}~(TWWL) algorithm, which leverages a \emph{tree structure} to scale up the exact computation of the WWL graph kernel for graph data with categorical node labels. In particular, the computational complexity of the TWWL algorithm is $\mathcal{O}(n_{1} + n_{2})$, which enables its application to large-scale graphs. Numerical experiments demonstrate that the performance of the proposed algorithm compares favorably with baseline kernels, while its computation is several orders of magnitude faster than the classic WWL graph kernel. This paves the way for applications in large-scale datasets where the WWL kernel is computationally prohibitive.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We have revised the descriptions of time-limit and memory-limit violations in the experimental results as pointed out by the reviewers, and corrected minor wording errors and typos.
Code: https://github.com/KeishiS/twwl
Assigned Action Editor: ~Rémi_Flamary1
Submission Number: 5395
Loading