Enhanced Model-agnostic Training of Deep Tabular Generation Models

Jayoung Kim; Noseong Park

Enhanced Model-agnostic Training of Deep Tabular Generation Models

Jayoung Kim, Noseong Park

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Supplementary Material: zip

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: mixture of gaussian, tabular data generation, score-based generative model

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Despite the active research for tabular data synthesis, state-of-the-art methods continue to face challenges arising from the complex distribution of tabular data. To this end, we claim that the difficulty can be alleviated by making the distribution simpler using Gaussian decomposition. In this paper, we propose a training method, Gaussian Decomposition-based Generation of Tabular data (GADGET), which can be applied to any generative models for tabular data. The method i) decomposes the complicated distribution of tabular data into a mixture of $K$ Gaussian distributions, ii) trains one model for each decomposed Gaussian distribution aided by our proposed self-paced learning algorithm. In other words, we do not stop at utilizing a Gaussian mixture model to discover $K$ simplified distributions but utilize their surrogate density functions for designing our self-paced learning algorithm. In our experiments with 11 datasets and 8 baselines, we show that GADGET greatly improves existing tabular data synthesis methods. In particular, a score-based generative model on our GADGET training framework achieves the state-of-the-art performance in terms of sampling quality and diversity.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4767

Loading