Abstract: Language Models (LMs) are increasingly challenging the dominance of domain-specific models, such as Graph Neural Networks (GNNs) and Graph Transformers (GTs), in graph learning tasks. Following this trend, we propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art (SOTA) GNNs on node classification tasks, without requiring any architectural modification. By preserving the LM's original architecture, our approach retains a key benefit of LM instruction tuning: the ability to jointly train on diverse datasets, fostering greater flexibility and efficiency. To achieve this, we introduce two key augmentation strategies: (1) Enriching LMs' input using topological and semantic retrieval methods, which provide richer contextual information, and (2) guiding the LMs' classification process through a lightweight GNN classifier that effectively prunes class candidates. Our experiments on real-world datasets show that backbone Flan-T5 LMs equipped with these augmentation strategies outperform SOTA text-output node classifiers and are comparable to top-performing vector-output node classifiers. By bridging the gap between specialized node classifiers and general LMs, this work paves the way for more versatile and widely applicable graph learning models. We will open-source the code upon publication.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: LLM/AI agents, fine-tuning, prompting, retrieval-augmented generation, graph-based methods, data augmentation
Contribution Types: NLP engineering experiment
Languages Studied: English
Previous URL: https://openreview.net/forum?id=RpFxevb22r
Explanation Of Revisions PDF: pdf
Reassignment Request Area Chair: No, I want the same area chair from our previous submission (subject to their availability).
Reassignment Request Reviewers: Yes, I want a different set of reviewers
Justification For Not Keeping Action Editor Or Reviewers: As we mentioned in the "review issue report" from the previous round of review: we believe that (1) some reviewers do not evince expertise and (2) do not acknowledge critical evidence in the author response. More detailed justification can be found in the last round review.
Software: zip
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: Section 8
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Sections 5.1 and 5.2
B2 Discuss The License For Artifacts: No
B2 Elaboration: We did not scrape or collect data from any source. We did not repackage any existing dataset. All the datasets and backbone models used in this paper are publicly accessible.
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: Section 5.1 states we use the datasets following existing works.
B4 Data Contains Personally Identifying Info Or Offensive Content: No
B4 Elaboration: The datasets used by this paper are publicly accessible and do no contain personally identifying info or offensive content.
B5 Documentation Of Artifacts: No
B5 Elaboration: We did not create any new dataset.
B6 Statistics For Data: Yes
B6 Elaboration: Section E (appendix)
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Section G.1 (appendix)
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Sections 5.1 and F (appendix)
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 5.2, all results are over 5 runs whose mean and std are reported.
C4 Parameters For Packages: Yes
C4 Elaboration: Sections 5.1 and F (appendix)
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: No
E1 Elaboration: We use LLM for paraphrasing or polishing the author’s original content, without proposing any new content. According to the "AI Writing/Coding Assistance Policy", such a case does not need to be disclosed.
Author Submission Checklist: yes
Submission Number: 425
Loading