Knowledge-Enhanced Tabular Data Generation

Knowledge-Enhanced Tabular Data Generation

ICLR 2026 Conference Submission24941 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Tabular data generation

Abstract: Tabular data generation methods aim to synthesize artificial samples by learning the distribution of training data. However, most existing tabular data generation methods are purely data-driven. They perform poorly when the training samples are insufficient or when there exists a distribution shift between training and true data. In many real-world scenarios, data owners are often able to provide additional knowledge beyond the raw data, such as domain-specific description or dependencies among features. Motivated by this, we categorize the types of knowledge that can effectively support tabular data generation, and incorporate selected knowledge as auxiliary information to guide the generation process. To this end, we propose KTGen, a $\textbf{K}$nowledge-enhanced $\textbf{T}$abular data $\textbf{Gen}$eration framework. KTGen leverages auxiliary information by training a correction network in the latent space produced by a VAE, aligning the generated data with the auxiliary information. Our experiments demonstrate that, when training on limited, biased data, incorporating auxiliary information makes the distribution of synthetic samples closer to the true data distribution, and also improves the performance of downstream models trained on the synthetic samples.

Primary Area: generative models

Submission Number: 24941

Loading