Graph World Model

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: The Graph World Model (GWM) efficiently integrates unstructured and graph-structured multi-modal data, enhancing diverse tasks through a unified framework.
Abstract: World models (WMs) demonstrate strong capabilities in prediction, generation, and planning tasks. Existing WMs primarily focus on unstructured data while cannot leverage the ubiquitous structured data, often represented as graphs, in the digital world. While multiple graph foundation models have been proposed, they focus on graph learning tasks and cannot extend to diverse multi-modal data and interdisciplinary tasks. To address these challenges, we propose the Graph World Model (GWM), a world model that supports both unstructured and graph-structured states with multi-modal information and represents diverse tasks as actions. The core of a GWM is a generic message-passing algorithm to aggregate structured information, either over a unified multi-modal token space by converting multi-modal data into text (GWM-T) or a unified multi-modal embedding space by modality-specific encoders (GWM-E). Notably, GWM introduces action nodes to support diverse tasks, where action nodes are linked to other nodes via direct reference or similarity computation. Extensive experiments on 6 tasks from diverse domains, including multi-modal generation and matching, recommendation, graph prediction, multi-agent, retrieval-augmented generation, and planning and optimization, show that the same GWM outperforms or matches domain-specific baselines' performance, benefits from multi-hop structures, and demonstrate strong zero-shot/few-shot capabilities on unseen new tasks. Our codes for GWM is released at https://github.com/ulab-uiuc/GWM.
Lay Summary: Structured data like graphs are everywhere—in social networks, molecules, and recommender systems—but most world models only work with unstructured inputs like images or text. We wanted to explore whether a world model could understand and act in graph-based environments just as well as it does in pixel-based ones. Our paper introduces the Graph World Model (GWM), which treats each state as a graph and each task as an “action node” connected to relevant information. We designed two versions: one based on language tokens, and another on compact embeddings, both using message-passing to integrate context. Surprisingly, GWM performs competitively with domain-specific models across six diverse tasks, despite using a unified architecture. This challenges the assumption that structured and unstructured data need separate models. Our findings suggest that bridging graphs and world models unlocks new paths for general AI systems that can plan, reason, and generate across data types and domains.
Link To Code: https://github.com/ulab-uiuc/GWM
Primary Area: Deep Learning->Graph Neural Networks
Keywords: Graph world model, Unstructured data, Multi-modal information, Zero-shot/few-shot capabilities
Submission Number: 14809
Loading