Coherent Multi-Table Data Synthesis for Tabular and Time-Series Data with GANs

Published: 01 Jan 2024, Last Modified: 14 Nov 2024CODASPY 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As the usage of user-private-data is increasingly monitored by regulatory institutions for security purposes, its transfer becomes more constrained. Synthetic data has recently emerged as a viable alternative to prevent the disclosure of user-protected information that complies with data sharing regulations. Both public and private sectors commonly use a combination of tabular and time-series tables that often contains user-related sensitive information. They are usually intrinsically interlinked as they describe the users and their behaviors over different perimeters. Moreover, it contains both numerical and categorical features, adding complexity to the anonymization task. State of the art generative methods, specialized either in tabular or time-series data, are able to generate high quality synthetic data. However, if each table is generated independently, it becomes impossible to link them. As a result, the usability of such synthetic data is impacted. To address this issue, we not only propose a coherent multi-table generative model that uses Generative Adversarial Networks (GANs) to sample both tabular and time-series tables, but also a conditional time-series generative model that handles both numerical and categorical features. Additionally, many experiments are conducted to analyse the inner modules of our model and evaluate it on an in-house private dataset in order to prove the viability of the synthetic data generated for machine learning tasks.
Loading