Keywords: Graph-to-Sequence, Autoregressive models, Non-autoregressive models, Diffusion Models
TL;DR: Graph-to-Sequence generation with Graph-Aware Diffusion Framework
Abstract: Pre-trained language models (PLMs) remain unreliable for graph-to-sequence (G2S) generation, where two challenges are particularly acute: (i) factual grounding, ensuring all entities are faithfully realized, and (ii) edit sensitivity, ensuring small, local graph edits to propagate consistently in the output. We propose Diffusion Language Models for Graphs (DLM4G), a non-autoregressive framework for iterative refinement conditioned on the graph input. Central to DLM4G is a graph-aware adaptive noising strategy, where noise is applied to the output sequence aligned with the graph components (entities and relations) using a learnable component-wise schedule. We learn a component-wise schedule by linearly mapping between per-component denoising loss and noise schedule. This ensures entities are generated faithfully and keeps graph edits localized in the text. Through extensive experiments on three benchmark datasets, DLM4G outperforms state-of-the-art autoregressive baselines that are 12–127× larger, achieving 10–15\% relative gains on standard surface-level metrics (BLEU, ChrF++, METEOR) and embedding-based metrics (BERTScore-F1, MAUVE). More importantly, DLM4G improves factual grounding (FGT, $\uparrow$) by +$\Delta_{\text{FGT}}$ 4.7 \% and edit sensitivity (EDR, $\uparrow$) by +$\Delta_{\text{EDR}}$ 7.9 \% on average compared to comparably sized autoregressive baselines. Finally, we evaluate on molecule captioning, where molecular graphs are verbalized into textual descriptions, demonstrating the applicability of DLM4G to biomedical G2S tasks.
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 17569
Loading