Tokenize, Diffuse, Decode: A Generative Approach to Neighborhood Discovery on Graphs

Published: 03 Mar 2026, Last Modified: 05 Mar 2026ICLR 2026 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Model, Graph Representation Learning, Tokenization
TL;DR: Node neighborhood discovery through discrete diffusion in a learned semantic token space.
Abstract: Node neighbor discovery is a central component of graph representation learning, yet most existing methods rely on heuristic sampling or deterministic neighborhood expansion that limits adaptivity and robustness. SemWalk introduces a generative approach to neighbor discovery that models a conditional distribution over informative neighbor sequences given a source node, using discrete diffusion over semantically tokenized graph representations. The method first learns a discrete semantic space by training a Residual Quantized Variational Autoencoder (RQ-VAE) to tokenize continuous node embeddings, and then trains an order-agnostic autoregressive diffusion model (OA-ARDM) in this space to generate permutation-invariant neighbor sequences. At inference time, discrete neighbors are sampled conditioned on the source node’s semantic token and decoded back into continuous embeddings via the RQ-VAE decoder, enabling diverse and high-quality neighborhood generation. Empirical results on large-scale multi-graph benchmarks show that SemWalk consistently matches or outperforms established baselines such as Personalized PageRank (PPR), with particularly strong robustness under test-time noise and graph heterogeneity, while remaining fully inductive and generalizing to unseen graphs.
Submission Number: 106
Loading