Keywords: Tabular Data Generation, Diffusion Models, Generative Models
TL;DR: Tabular diffusion architecture trained with a unified continuous representation.
Abstract: Diffusion models for tabular data generation face a conundrum between separate and unified data representations. The former struggles with jointly capturing multi-modal distributions, while the latter often relies on sparse, suboptimal encodings and incurs high computational costs. In this work, we address the latter by presenting TabRep, a diffusion architecture trained with a unified, continuous representation tailored for tabular data. Motivated by geometric insights of the data manifold, our representation is dense, separable, and preserves intrinsic relationships. TabRep achieves state-of-the-art performance, synthesizing data that surpasses the original in downstream quality, while maintaining privacy and efficiency.
Submission Number: 17
Loading