Keywords: Algorithmic Reasoning; Large Language Models; Agent
TL;DR: This work presents a new benchmark for LLMs on graph reasoning, shows the importance of input representation, and introduces an adaptive method to improve performance.
Abstract: Large Language Models (LLMs) are increasingly applied to tasks involving structured data, such as graphs. However, their ability to perform complex algorithmic reasoning over graph-structured inputs remains under-explored. Existing benchmarks typically focus on basic reasoning over small graphs or code generation for graph tasks, but they do not provide models with direct access to graph-structured data, which limits a comprehensive evaluation of their graph reasoning capabilities.
To address this gap, we introduce **Graph Theory Bench (GT Bench)** , a challenging new benchmark featuring 44 diverse graph problem types (ranging from connectivity to minimum-cost flow) across over 100,000 instances with varied input representations (natural language, structured language, adjacency list, adjacency matrix). GT Bench is designed specifically to evaluate the ability of LLMs to perform multi-step algorithmic reasoning on graph-structured tasks.
Our evaluation uncovers a critical dependency between LLM performance and the chosen input graph representation, which varies with graph structure (e.g., density, topology). Based on these findings, we propose the **Graph Theory Agent (GTA)** , a novel framework that enhances LLM graph reasoning by employing an adaptive input representation selector and decomposing the algorithmic solution into manageable sub-steps. Experiments demonstrate that GTA significantly improves the ability of LLMs to solve complex graph problems.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 12353
Loading