Keywords: Heterogeneous Text-rich Network, Pretrained Language Model, Hierarchical Prompt
Abstract: Representation learning on heterogeneous text-rich networks (HTRNs), consisting of multiple types of nodes and edges with each node associated with text data, is essential for various real-world applications. Given the success of pretrained language models (PLMs) in processing text data, recent efforts have integrated PLMs into HTRN representation learning, typically handling textual and structural information separately with PLMs and heterogeneous graph neural networks (HGNNs), respectively. However, this separation fails to capture critical interactions between these two types of data, and necessitates alignment between distinct embedding spaces, which is often challenging. To address this, we propose HierPromptLM, a novel pure PLM-based framework that models text data and heterogeneous structures without separate processing. First, we develop Hierarchical Prompt that employs prompt learning to integrate text data and structures at both node and edge levels, within a unified textual space. Built on this, two innovative HTRN-tailored pretraining tasks are introduced to fine-tune PLMs, emphasizing the heterogeneity and interactions between these two types of data. Experiments on HTRN datasets demonstrate HierPromptLM outperforms state-of-the-art methods, achieving significant improvements of up to 7.15% on node classification, 9.79% on link prediction, and 2.88% on graph classification. The codes are in https://anonymous.4open.science/r/HierPromptLM-code.
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 10795
Loading