Keywords: LLMs, graphs
Abstract: Graphs are essential tools for modeling complex relationships. While prior research with earlier generations of large language models (LLMs) showed them to struggle with basic graph primitives, we find that the situation has changed with modern state-of-the-art (SOTA) LLMs, which excel at these tasks. Given these advances, we propose a more challenging evaluation problem: graph modification, a foundational, interpretable, and non-trivial problem in which an LLM must determine the outcome of adding or deleting a given sequence of nodes or edges, and potentially then compute on the resulting modified graph. We introduce GraphModQA, a novel benchmark dataset comprising graph modification question-answer pairs designed to rigorously test LLMs’ abilities in graph manipulation and dynamic reasoning. Our results show that while SOTA LLMs perform well on static graph property tasks, their accuracy degrades on graph modification tasks; their performance is particularly low as the number of modifications increases, and when the adjacency matrix is used to represent the graph --- an essential encoding not explored in previous work. We provide new techniques for improving performance on graph modification tasks, and we introduce Modify and Print (MAP) prompting, which asks models to output the intermediate adjacency matrices at each step, and which markedly improves the models' performance. Our findings highlight a critical gap in current LLM capabilities regarding dynamic graph reasoning tasks and underscore the potential of techniques like MAP prompting to mitigate these challenges.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7321
Loading