TL;DR: TabbyFlow extends flow matching to tabular data generation by leveraging exponential family distributions to handle mixed data types efficiently.
Abstract: While denoising diffusion and flow matching have driven major advances in generative modeling, their application to tabular data remains limited, despite its ubiquity in real-world applications.
To this end, we develop *TabbyFlow*, a variational Flow Matching (VFM) method for tabular data generation.
To apply VFM to data with mixed continuous and discrete features, we introduce **Exponential Family Variational Flow Matching (EF-VFM)**, which represents heterogeneous data types using a general exponential family distribution.
We hereby obtain an efficient, data-driven objective based on moment matching, enabling principled learning of probability paths over mixed continuous and discrete variables.
We also establish a connection between variational flow matching and generalized flow matching objectives based on Bregman divergences.
Evaluation on tabular data benchmarks demonstrates state-of-the-art performance compared to baselines.
Lay Summary: Recent techniques for creating realistic artificial data have greatly improved image and text generation.
However, generating realistic tabular data, for example, patient records, is still challenging, even though this type of data is everywhere in practical applications.
To address this, we introduce TabbyFlow, a new method specifically designed to generate realistic tabular data.
Tabular data often includes different kinds of information: numbers, categories, yes/no answers, etc.
Our approach, called **Exponential Family Variational Flow Matching (EF-VFM)**, can handle all these different data types smoothly.
It cleverly combines and models numerical and categorical data, ensuring the generated data closely matches real data.
Our approach simplifies and improves the way artificial tabular data is generated, making it more accurate and realistic.
Tests show that *TabbyFlow* outperforms other leading methods, producing data that better matches real-world tabular information.
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Variational Flow Matching, Exponential Families, Tabular Data
Submission Number: 16008
Loading