Synthetic Data Generation of Many-to-Many Datasets via Random Graph Generation

Kai Xu; Georgi Ganev; Emile Joubert; Rees Davison; Olivier Van Acker; Luke Robinson

Synthetic Data Generation of Many-to-Many Datasets via Random Graph Generation

Kai Xu, Georgi Ganev, Emile Joubert, Rees Davison, Olivier Van Acker, Luke Robinson

Published: 01 Feb 2023, Last Modified: 01 Mar 2023ICLR 2023 posterReaders: Everyone

Keywords: synthetic data generation, random graph generation, differential privacy

TL;DR: We synthesise datasets with many-to-many relationships by first generating the relationships via random graph generation and then generating the data attributes.

Abstract: Synthetic data generation (SDG) has become a popular approach to release private datasets. In SDG, a generative model is fitted on the private real data, and samples drawn from the model are released as the protected synthetic data. While real-world datasets usually consist of multiple tables with potential \emph{many-to-many} relationships (i.e.~\emph{many-to-many datasets}), recent research in SDG mostly focuses on modeling tables \emph{independently} or only considers generating datasets with special cases of many-to-many relationships such as \emph{one-to-many}. In this paper, we first study challenges of building faithful generative models for many-to-many datasets, identifying limitations of existing methods. We then present a novel factorization for many-to-many generative models, which leads to a scalable generation framework by combining recent results from random graph theory and representation learning. Finally, we extend the framework to establish the notion of $(\epsilon,\delta)$-differential privacy. Through a real-world dataset, we demonstrate that our method can generate synthetic datasets while preserving information within and across tables better than its closest competitor.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Generative models

21 Replies

Loading