Graph as New Language: LLM-Based Graph Learning with Node-to-Center Path Sequences as Training Corpus

ACL ARR 2025 May Submission298 Authors

10 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Graph learning is widely encountered in the real-world applications. Existing approaches typically combine graph neural networks with NLP methods, recently with large language models (LLMs), to encode node texts. However, this two-stage paradigm suffers from a suboptimal alignment between textual and structural features. Since LLMs are probabilistic models excelling at next-word prediction, not inherently designed for graphs, we propose a new perspective that treats graphs as a new language, enabling language models to predict node sequences by learning from graph structure. Unlike natural language with existing coherent and abundant corpora, graphs fail to provide structured and meaningful node orders inherently, making the corpus construction with high-quality node sequences challenging. To address this problem, we design PathGLM (Path-based Graph Language Model), which first builds the community-centric corpus that constrains path selection within community scope. Next, we extract structurally node-to-center paths fed into LLMs to learn the graph language grammar, also serving as prefixes in fine-tuning. Experimental results illustrate that PathGLM improves semantic-structure integration and achieves state-of-the-art performance.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: graph learning, text attributed graphs, large language models, graph language, structure-semantic integration
Contribution Types: Model analysis & interpretability
Languages Studied: English
Keywords: graph learning, text attributed graphs, large language models, graph language, structure-semantic integration
Submission Number: 298
Loading