Keywords: relational data, synthetic data, heterogeneous graphs, relational deep learning, tabular data
TL;DR: This paper presents a novel method for synthetic relational data generation by leveraging Graph Neural Networks (GNNs) and achieves state-of-the-art performance in terms of multi-table fidelity.
Abstract: Relational data synthesis is a complex task that requires effective modeling of mixed data types spread across multiple tables connected by foreign key constraints. Most of the research in tabular data synthesis has focused on single tables, which has resulted in current approaches failing to successfully model the relational aspects of the data. Most of the methods do not explicitly model the topological structure of the data and struggle to capture the dependence between columns in different tables. To address these challenges, we introduce a novel approach that uses a graph representation of the relational data induced by foreign key constraints. This representation leverages the expressive power of graph neural networks (GNNs) to capture the structure of the data. Our proposed method uses GNN embeddings to condition a tabular latent score-based diffusion model. This combination allows the model to capture relationships between tables while preserving the structural and statistical properties of the data. We demonstrate the effectiveness of our approach on six benchmark datasets in terms of multi-table fidelity and utility metrics.
Submission Number: 81
Loading