Keywords: tabular data, flow matching, generative modeling, synthetic data
TL;DR: A cascaded flow matching framework that generates details in tabular data conditioned on low-resolution features.
Abstract: Advances in generative modeling have recently been adapted to heterogeneous tabular data. However, generating mixed-type features that combine discrete values with an otherwise continuous distribution remains challenging.
We advance the state-of-the-art in diffusion-based generative models for heterogeneous tabular data with a cascaded approach.
As such, we conceptualize categorical variables and numerical features as low- and high-resolution representations of a tabular data row. We derive a feature-wise low-resolution representation of numerical features that allows the direct incorporation of mixed-type features including missing values or discrete outcomes with non-zero probability mass.
This coarse information is leveraged to guide the high-resolution flow matching model via a novel conditional probability path.
We prove that this lowers the transport costs of the flow matching model.
The results illustrate that our cascaded pipeline generates more realistic samples and learns the details of distributions more accurately.
Primary Area: applications to computer vision, audio, language, and other modalities
Supplementary Material: zip
Submission Number: 18697
Loading