Directed Graph Grammars for Sequence-based Learning

Michael Sun; Orion Foo; Gang Liu; Wojciech Matusik; Jie Chen

Directed Graph Grammars for Sequence-based Learning

Michael Sun, Orion Foo, Gang Liu, Wojciech Matusik, Jie Chen

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We establish a principled bijective problem mapping between DAG modeling and sequence modeling.

Abstract: Directed acyclic graphs (DAGs) are a class of graphs commonly used in practice, with examples that include electronic circuits, Bayesian networks, and neural architectures. While many effective encoders exist for DAGs, it remains challenging to decode them in a principled manner, because the nodes of a DAG can have many different topological orders. In this work, we propose a grammar-based approach to constructing a principled, compact and equivalent sequential representation of a DAG. Specifically, we view a graph as derivations over an unambiguous grammar, where the DAG corresponds to a unique sequence of production rules. Equivalently, the procedure to construct such a description can be viewed as a lossless compression of the data. Such a representation has many uses, including building a generative model for graph generation, learning a latent space for property prediction, and leveraging the sequence representational continuity for Bayesian Optimization over structured data.

Lay Summary: Directed acyclic graphs (DAGs) are used to represent everything from electronic circuits to neural network architectures, but they are hard to generate because there’s no single “correct” order in which to build their nodes and edges, leading to order-dependent sequential descriptions that lead to brittle decoders. We introduce Directed Graph Grammar Embedded Derivations (DIGGED), which rewrites a DAG as a sequence of rule-based construction steps. We learn these rules by repeatedly mining common patterns, solving compatibility puzzles to derive maximally shared rewrite rules via a description length minimization objective, and iteratively compressing the provided data without losing information. Crucially, the DIGGED representation establishes a one-to-one problem mapping between DAG generation and sequence modeling, and we take full advantage by training Transformer-based decoders for generation and downstream optimization. On benchmarks spanning neural architectures, Bayesian networks, and analog circuits, DIGGED achieves perfect validity and uniqueness, substantial gains in predictive accuracy and Bayesian optimization performance over prior DAG decoders, and delivers an interpretable, compact representation that bridges graph-structured data with powerful sequence-based methods. By turning any DAG collection into a compact, principled “design language,” DIGGED opens the door to more reliable, scalable graph design and optimization across scientific and engineering domains.

Link To Code: https://github.com/shiningsunnyday/induction

Primary Area: Deep Learning->Graph Neural Networks

Keywords: graph generative model, graph mining, grammar, neurosymbolic, directed acyclic graphs, DAG

Submission Number: 8812

Loading