Text2GraphBench: A Comprehensive Benchmark for Evaluating Text-Instructed Graph Generation with Large Language Models

Xingliang Wang; Zemin Liu; Junxiao Han; Zhen Qin; Chen Zhi; Shuiguang Deng

Text2GraphBench: A Comprehensive Benchmark for Evaluating Text-Instructed Graph Generation with Large Language Models

Xingliang Wang, Zemin Liu, Junxiao Han, Zhen Qin, Chen Zhi, Shuiguang Deng

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Benchmark, Graph Generation, Large Language Models, Text-to-Graph Generation

Abstract: The rise of Large Language Models (LLMs) is driving a paradigm shift in graph generation, from traditional statistical modeling to the emerging paradigm of Text-instructed Graph Generation. However, the development of this research field faces a critical bottleneck: a severe lack of benchmarks specifically designed for this new paradigm. This prevents a reliable and in-depth analysis of the capabilities of existing models. To address this issue, we introduce Text2GraphBench, a comprehensive benchmark designed to evaluate and analyze the performance of large models on this task. At the core of Text2GraphBench is a methodology for benchmark curation and evaluation centered on constraints. For dataset curation, we pioneer a ``graph-to-constraint, constraint-to-text'' generation pipeline, building a large-scale, multi-domain dataset that ensures every textual instruction corresponds to a precisely verifiable constraint. For the evaluation system, we propose a novel, constraint-based three-dimensional evaluation framework that moves beyond traditional similarity comparisons, assessing generated graphs from the perspectives of Validity, Semantic Fidelity, and Novelty in a thorough and quantifiable manner. We conduct extensive evaluations on a range of mainstream LLMs using Text2GraphBench, and our results provide the first systematic revelation of the current capabilities, strengths, and challenges of these models. We hope that Text2GraphBench will provide the community with a valuable tool to quantify model capabilities and inspire future research. Our datasets, code, and analysis results are fully open-sourced.

Primary Area: datasets and benchmarks

Submission Number: 25069

Loading