Tokenize, Diffuse, Decode: A Generative Approach to Neighborhood Discovery on Graphs

Zhuowen Yuan; Tao Liu; Kaushik Rangadurai; Yang Yang; Minhui Huang; Yiping Han; Bo Li; Shuang Yang

Tokenize, Diffuse, Decode: A Generative Approach to Neighborhood Discovery on Graphs

Zhuowen Yuan, Tao Liu, Kaushik Rangadurai, Yang Yang, Minhui Huang, Yiping Han, Bo Li, Shuang Yang

Published: 03 Mar 2026, Last Modified: 07 Apr 2026ICLR 2026 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Model, Graph Representation Learning, Tokenization

TL;DR: Node neighborhood discovery through discrete diffusion in a learned semantic token space.

Abstract: Node neighbor discovery is a central component of graph representation learning, yet most existing methods rely on heuristic sampling or deterministic neighborhood expansion that limits adaptivity and robustness. SemWalk introduces a generative approach to neighbor discovery that models a conditional distribution over informative neighbor sequences given a source node, using discrete diffusion over semantically tokenized graph representations. The method first learns a discrete semantic space by training a Residual Quantized Variational Autoencoder (RQ-VAE) to tokenize continuous node embeddings, and then trains an order-agnostic autoregressive diffusion model (OA-ARDM) in this space to generate permutation-invariant neighbor sequences. At inference time, discrete neighbors are sampled conditioned on the source node’s semantic token and decoded back into continuous embeddings via the RQ-VAE decoder, enabling diverse and high-quality neighborhood generation. Empirical results on large-scale multi-graph benchmarks show that SemWalk consistently matches or outperforms established baselines such as Personalized PageRank (PPR), with particularly strong robustness under test-time noise and graph heterogeneity, while remaining fully inductive and generalizing to unseen graphs.

Submission Number: 106

Loading