everyone
since 05 Feb 2025">EveryoneRevisionsBibTeXCC BY 4.0
Large Language Models (LLMs) have achieved great success in various reasoning tasks. However, their capacity for graph reasoning remains poorly understood. Although recent theoretical analyses suggest that LLMs can, in principle, perform complex graph tasks, empirical evaluations reveal numerous failures. To bridge this gap, we revisit the graph reasoning ability by introducing a new, balanced, and comprehensive benchmark. Through systematic experimentation, we identify key factors influencing performance, including node connectivity types, graph sizes, graph descriptions, and node naming methods. Moreover, we also demonstrate the impact of training data, model size and fine-tuning on graph reasoning. All the implementations and datasets are publicly available.