TabGFN: Tabular data generation based on GFlowNets

SUNG YUP LEE; Cheol Ho Kim; Byounghwa Lee; Jung-Hoon Lee

TabGFN: Tabular data generation based on GFlowNets

SUNG YUP LEE, Cheol Ho Kim, Byounghwa Lee, Jung-Hoon Lee

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: generative model, synthetic tabular data, GFlowNets, GAN

TL;DR: We propose TabGFN, a novel approach for generating synthetic tabular data that excels in quality while providing traceability, using the GFlowNets and a critic network from the WGAN, to represent conditional relationships and order of features.

Abstract: Generation of synthetic tabular data plays an important role in privacy-preserving data sharing, training data augmentation, data imputation, and algorithm development in various domains such as healthcare and finance. Achieving both high predictive performance and model traceability in tabular data generation is challenging for neural-network-based algorithms due to their inherent model opacity. To overcome this limitation, we present a novel approach for generating synthetic tabular data called TabGFN. It employs generative flow networks for feature generation and uses the critic network of the Wasserstein generative adversarial network with gradient penalty as its reward function. Through simultaneous and iterative training of the flow network and reward function, TabGFN explores a directed acyclic graph of the generative state space, yielding a generative model that represents conditional relationships and feature order. Benchmark tests on diverse datasets demonstrate that the quality of the synthetic data by TabGFN is superior or comparable to that of state-of-the-art algorithms. Moreover, the entire generation process is traceable, as its individual steps are explicitly provided. This traceability enables the discovery of mutual dependencies between features, leading to an interpretable model, which is crucial for high-stakes decision-making. Thus, the proposed approach offers an effective solution for generating tabular data, providing both high-quality synthesis and traceability.

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5075

Loading