Keywords: relational deep learning, graph neural networks, relational database, diffusion models, stochastic blockmodels
TL;DR: RelDiff is a new method for generating synthetic relational databases. It models the database structure as a graph and uses a diffusion process to generate the table data, outperforming existing methods in utility and fidelity.
Abstract: Real-world databases are predominantly relational, comprising multiple interlinked tables that contain complex structural and statistical dependencies.
Learning generative models on relational data has shown great promise for producing synthetic data, which can unlock access to previously underutilized information and support the training of powerful foundation models.
However, existing methods often struggle to capture their complexity, typically reducing relational data to conditionally generated flat tables and imposing limiting structural assumptions.
To address these limitations, we introduce RelDiff, a novel diffusion generative model that synthesizes relational databases by explicitly modeling their foreign key graph structure.
RelDiff combines a joint graph-conditioned diffusion process across all tables for attribute synthesis and a $D2K+$SBM graph generator based on the stochastic block model for structure generation.
The decomposition of graph structure and relational attributes ensures both high fidelity and referential integrity, both of which are crucial aspects of synthetic relational database generation.
RelDiff achieves state-of-the-art performance in generating synthetic relational databases on 11 benchmark datasets.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 21712
Loading