How to Make LMs Strong Node Classifiers?

How to Make LMs Strong Node Classifiers?

ACL ARR 2025 July Submission425 Authors

28 Jul 2025 (modified: 30 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Language Models (LMs) are increasingly challenging the dominance of domain-specific models, such as Graph Neural Networks (GNNs) and Graph Transformers (GTs), in graph learning tasks. Following this trend, we propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art (SOTA) GNNs on node classification tasks, without requiring any architectural modification. By preserving the LM's original architecture, our approach retains a key benefit of LM instruction tuning: the ability to jointly train on diverse datasets, fostering greater flexibility and efficiency. To achieve this, we introduce two key augmentation strategies: (1) Enriching LMs' input using topological and semantic retrieval methods, which provide richer contextual information, and (2) guiding the LMs' classification process through a lightweight GNN classifier that effectively prunes class candidates. Our experiments on real-world datasets show that backbone Flan-T5 LMs equipped with these augmentation strategies outperform SOTA text-output node classifiers and are comparable to top-performing vector-output node classifiers. By bridging the gap between specialized node classifiers and general LMs, this work paves the way for more versatile and widely applicable graph learning models. We will open-source the code upon publication.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: LLM/AI agents, fine-tuning, prompting, retrieval-augmented generation, graph-based methods, data augmentation

Contribution Types: NLP engineering experiment

Languages Studied: English

Previous URL: https://openreview.net/forum?id=RpFxevb22r

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: No, I want the same area chair from our previous submission (subject to their availability).

Reassignment Request Reviewers: Yes, I want a different set of reviewers

Justification For Not Keeping Action Editor Or Reviewers: As we mentioned in the "review issue report" from the previous round of review: we believe that (1) some reviewers do not evince expertise and (2) do not acknowledge critical evidence in the author response. More detailed justification can be found in the last round review.

Software: zip

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: Yes

A2 Elaboration: Section 8

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Sections 5.1 and 5.2

B2 Discuss The License For Artifacts: No

B2 Elaboration: We did not scrape or collect data from any source. We did not repackage any existing dataset. All the datasets and backbone models used in this paper are publicly accessible.

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: Section 5.1 states we use the datasets following existing works.

B4 Data Contains Personally Identifying Info Or Offensive Content: No

B4 Elaboration: The datasets used by this paper are publicly accessible and do no contain personally identifying info or offensive content.

B5 Documentation Of Artifacts: No

B5 Elaboration: We did not create any new dataset.

B6 Statistics For Data: Yes

B6 Elaboration: Section E (appendix)

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Section G.1 (appendix)

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Sections 5.1 and F (appendix)

C3 Descriptive Statistics: Yes

C3 Elaboration: Section 5.2, all results are over 5 runs whose mean and std are reported.

C4 Parameters For Packages: Yes

C4 Elaboration: Sections 5.1 and F (appendix)

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: No

E1 Elaboration: We use LLM for paraphrasing or polishing the author’s original content, without proposing any new content. According to the "AI Writing/Coding Assistance Policy", such a case does not need to be disclosed.

Author Submission Checklist: yes

Submission Number: 425

Loading