Text-to-graph Generation with Conditional Diffusion Models Guided by Graph-aligned LLMs

Yang Yao; Xin Wang; Yijian Qin; Zeyang Zhang; Wenwu Zhu; Hong Mei

Text-to-graph Generation with Conditional Diffusion Models Guided by Graph-aligned LLMs

Yang Yao, Xin Wang, Yijian Qin, Zeyang Zhang, Wenwu Zhu, Hong Mei

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language model, text-to-graph, diffusion model, graph generation

Abstract: Text-to-graph generation, aiming for controlled graph generation based on natural language instructions, holds significant application potentials in real-world scenarios such as drug discoveries. However, existing generative models fail to achieve text-to-graph generation in the following two aspects: i) language model-based generative models struggle with generating complex graph structures, and ii) graph-based generative models mainly focus on unconditional graph generation or conditional generation with simple conditions, falling short in understanding as well as following human instructions. In this paper, we tackle the text-to-graph generation problem by employing graph diffusion models with guidance from large language models (LLMs) for the first time, to the best of our knowledge. The problem is highly non-trivial with the following challenges: 1) How to align LLMs for understanding the irregular graph structures and the graph properties hidden in human instructions, 2) How to align graph diffusion models for following natural language instructions in order to generate graphs with expected relational semantics from human. To address these challenges, we propose a novel LLM-aligned Graph Diffusion Model (LLM-GDM), which is able to generate graphs based on natural language instructions. In particular, we first propose the self-supervised text-graph alignment to empower LLMs with the ability to accurately understand graph structures and properties by finetuning LLMs with several specially designed alignment tasks involving various graph components such as nodes, edges, and subgraphs. Then, we propose a structure-aware cross-attention mechanism guiding the diffusion model to follow human instructions through inherently capturing the relational semantics among texts and structures. Extensive experiments on both synthetic and real-world molecular datasets demonstrate the effectiveness of our proposed LLM-GDM model over existing baseline methods.

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11171

Loading