Keywords: Graph Neural Networks, Deep learning, Optimization, Kernel Method
Abstract: Graph Neural Networks (GNNs) have become the de facto method for machine learning on graph data (e.g., social networks, protein structures, code ASTs), but they require significant time and resource to train. One alternative method is Graph Neural Tangent Kernel (GNTK), a kernel method that corresponds to infinitely wide multi-layer GNNs. GNTK's parameters can be solved directly in a single step, avoiding time-consuming gradient descent. Today, GNTK is the state-of-the-art method to achieve high training speed without compromising accuracy. Unfortunately, solving for the kernel and searching for parameters can still take hours to days on real-world graphs. The current computation of GNTK has running time $O(N^4)$, where $N$ is the number of nodes in the graph. This prevents GNTK from scaling to datasets that contain large graphs. Theoretically, we present two techniques to speed up GNTK training while preserving the generalization error: (1) We use a novel matrix decoupling method to reduce matrix dimensions during the kernel solving. This allows us to reduce the dominated computation bottleneck term from $O(N^4)$ to $O(N^3)$. (2) We apply sketching to further reduce the bottleneck term to $o(N^{\omega})$, where $\omega \approx 2.373$ is the exponent of current matrix multiplication. Experimentally, we demonstrate that our approaches speed up kernel learning by up to $19\times$ on real-world benchmark datasets.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Reviewed Version (pdf): https://openreview.net/references/pdf?id=H3fDDCTS4n
11 Replies
Loading