Generalizable LLM Learning of Graph Synthetic Data with Post-training Alignment

Generalizable LLM Learning of Graph Synthetic Data with Post-training Alignment

ACL ARR 2025 July Submission939 Authors

29 Jul 2025 (modified: 04 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Previous research has sought to enhance the graph reasoning capabilities of LLMs by supervised fine-tuning on synthetic graph data. While these led to specialized LLMs better at solving graph algorithm problems, we don't need LLMs for shortest path: we need generalization from synthetic graph data to real-world tasks with implicit graph structures. In this work, we propose to unlock generalizable learning of graph with post-training alignment with synthetic data. We first design solution-based and process-based rewards for synthetic graph problems: instead of rigid memorizing response patterns in direct fine-tuning, we posit that post-training alignment would help LLMs grasp the essentials underlying graph reasoning and alleviate overfitting on synthetic data. We employ post-training alignment algorithms such as GRPO and DPO, aligning both off-the-shelf LLMs and LLMs fine-tuned on synthetic graph data. We then compare them against existing settings on both in-domain synthetic tasks and out-of-domain real-world tasks with implicit graph structures such as multi-hop QA, structured planning, and more. Extensive experiments demonstrate that our post-training alignment recipe leads to statistically significant improvement on 5 datasets, with an average gain of 12.9% over baseline settings. Further analysis reveals that process-based rewards consistently outperform solution-based rewards on synthetic data but not on real-world tasks, and compositionality and explainable intermediate steps remains a critical challenge even after post-training alignment.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: transfer, interpretability, post-training alignment, graph learning, real-world graph tasks, reinforcement learning

Contribution Types: Model analysis & interpretability, Reproduction study, Data analysis

Languages Studied: English

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: N/A

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: 3

B2 Discuss The License For Artifacts: No

B2 Elaboration: Models, codes and datasets are open-source for research purposes.

B3 Artifact Use Consistent With Intended Use: No

B3 Elaboration: We didn't discuss the intended use in the paper but we intend to use the models, codes and datasets for research purposes only.

B4 Data Contains Personally Identifying Info Or Offensive Content: N/A

B5 Documentation Of Artifacts: N/A

B6 Statistics For Data: Yes

B6 Elaboration: 3.1

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: 3.2

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: 3.2

C3 Descriptive Statistics: Yes

C3 Elaboration: 4

C4 Parameters For Packages: Yes

C4 Elaboration: 3

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: No

E1 Information About Use Of Ai Assistants: N/A

Author Submission Checklist: yes

Submission Number: 939

Loading