TESH-GCN: Text Enriched Sparse Hyperbolic Graph Convolutional Networks

TMLR Paper913 Authors

03 Mar 2023 (modified: 24 Apr 2023)Rejected by TMLREveryoneRevisionsBibTeX
Abstract: Heterogeneous networks, which connect informative nodes containing semantic information with different edge types, are routinely used to store and process information in various real-world applications. Graph Neural Networks (GNNs) and their hyperbolic variants provide a promising approach to encode such networks in a low-dimensional latent space through neighborhood aggregation and hierarchical feature extraction, respectively. However, these approaches typically ignore metapath structures and the available semantic information. Furthermore, these approaches are sensitive to the noise present in the training data. To tackle these limitations, in this paper, we propose Text Enriched Sparse Hyperbolic Graph Convolution Network (TESH-GCN). In TESH-GCN, we use semantic node information to identify relevant nodes and extract their local neighborhood and graph-level metapath features. This is done by applying a reformulated hyperbolic graph convolution layer to the sparse adjacency tensor using the semantic node information as a connection signal. These extracted features in conjunction with semantic features from the language model (for robustness) are used for the final downstream tasks. Experiments on various heterogeneous graph datasets show that our model outperforms the state-of-the-art approaches by a large margin on the task of link prediction. We also report a reduction in both the training time and model parameters compared to the existing hyperbolic approaches through a reformulated hyperbolic graph convolution. Furthermore, we illustrate the robustness of our model by experimenting with different levels of simulated noise in both the graph structure and text, and also, present a mechanism to explain TESH-GCN’s prediction by analyzing the extracted metapaths.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=gYo8y67uNe
Changes Since Last Submission: Improved explanation of Metapaths: Added to the Introduction section: However, it is impractical to manually incorporate every one of the exhaustive set of graph metrics available, ... On the other hand, learning global metapaths could capture long-term relations between the nodes. (15 lines) Added more methods in the related work: Added to the Section 2.1: In addition, there have been recent efforts to develop application-specific approaches such as OntoProtein Zhang et al. (2022) for protein ontologies and QA-GNN Yasunaga et al. (2021) for question-answering knowledge graphs. These approaches use semantic node embeddings to enhance KG reasoning algorithms. However, they are limited to considering only the local node neighborhoods due to their reasoning objective and are not suitable for capturing global structural information. Added definition of Homogenized Graph: Added Defintion 2 to Section 3.1: Homogenized Graph: Let $\mathcal{G} = (V, E)$ be a heterogeneous graph with a set of nodes $V$ and a set of edges $E$ with $K$ edge types. The homogenized version of $\mathcal{G}$, denoted as $\mathcal{G}'$, is obtained by replacing each edge type $e_k \in E$ to common edge type $e$ using the mapping function $f(e_k) = e~\forall e_k \in E$. The edge distance between nodes $a$ and $b$ in the homogenized graph $\mathcal{G}'$, denoted by $dist(a, b)$, is defined as the shortest path length between $a$ and $b$ in the transformed graph. Here, the length of a path is defined as the number of edges along the path $a\rightarrow b$. Rewritten section Incorporating Semantics into Adjacency Tensor: 1. Updated Figure 4 with a more details and new caption: Adding semantic signals $t_i, t_j \in \mathbb{R}^{D=4}$ of nodes i and j to the sparse adjacency matrix $A_k \in \mathbb{R}^{64}$ of a graph with $|V|=8$ nodes and $|E|=8$ edges. The nodes' independent semantic dimensions are added to their corresponding position in the independent adjacency matrix copies. This addition focuses the subsequent convolution operation on the highlighted areas (due to the presence of non-zeros) to initiate the extraction of graph features at the location of the input nodes. 2. Refined the subsection Incorporating Semantics into Adjacency Tensor: \textbf{Incorporating Semantics into Adjacency Tensor:}} To integrate the nodes' textual information with the graph structure, we enhance the adjacency tensor of the heterogeneous graph with semantic features of nodes. We obtain semantic signals using a pre-trained language model (LM) developed by Song et al. (2020) to encode each node's textual data into a vector $t \in \mathbb{R}^D$. It should be noted that the dimensions of a semantic vector are linearly independent and hence, each dimension corresponds to a unique independent semantic feature. Due to this, we cannot simply add the semantic vectors to the adjacency matrix to incorporate them. To overcome this issue, we propose a novel solution wherein we stack $D$-repetitions of the adjacency matrix $e_k$ to form tensor $A_k$ and add each independent semantic dimension $t[d] \in t$ to a corresponding adjacency matrix $A_k[d] \in A_k$. Moreover, we add the semantic dimension in the node's position within the adjacency matrix to maintain positional consistency. This ensures that the adjacency tensor $A_k$ captures the nodes' semantic signals in their appropriate locations within the graph structure. Please refer to Figure 4 for a clear illustration of the entire process. An important consideration in this operation is that it does increase the density of the adjacency matrix by $\frac{2}{|V|}$. However, we observe that this increase has negligible impact on the sparsity of real-world datasets (statistics provided in Table 1).} \begin{align} t_i &= LM(s_i),\quad t_j = LM(s_j)\\\\ A_k[d,i,:] &= A_k[d,i,:] + t_i[d],~~A_k[d,:,j] = A_k[d,:,j]+t_j[d]\quad\forall d=1\rightarrow D \end{align} Description of baselines: Added to Section 4.2: The selection of our baseline models was driven by two key factors; the diversity of methods employed, and their suitability to the datasets used in our experimental setup. To this end, we compare the performance of the proposed model with the following state-of-the-art models in the following categories: text-based (1-3), graph-based (4-6), and hybrid text-graph (7-9) approaches. Added description of GCN complexity: Added footnote 7 to Table 5: Note that, in the case of GNN-based networks, the basic formulations use sparse graph representations, which makes their complexity linear in the number of edges. However, in practice, GPU machines do not support sparse representations, and hence, GCNs need to be operated on dense adjacency matrices which leads to a time complexity of $O(V^2)$. Moved notations table and algorithms to the Appendix: Added Appendix A: Notations Table and B: Algorithms
Assigned Action Editor: ~Guillaume_Rabusseau1
Submission Number: 913
Loading