TabFlowM: Lightweight flow matching for Mixed-Type Tabular Data Synthesis in Latent Space

TabFlowM: Lightweight flow matching for Mixed-Type Tabular Data Synthesis in Latent Space

TMLR Paper8106 Authors

26 Mar 2026 (modified: 20 Jun 2026)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Generative modeling for mixed-type tabular data has recently been dominated by diffusion-based methods, but their gains often come with schedule design, time dependent score parameterization, and multi-step solvers that increase computational overhead and tuning difficulty. We present \textbf{TabFlowM}, a lightweight framework that asks a more targeted question: once mixed-type records are mapped into a decoder compatible continuous transport space, is diffusion style score learning still necessary? TabFlowM answers this by training a single time conditioned velocity field via flow matching to deterministically transport Gaussian noise to the latent token space data distribution, replacing diffusion specific score estimation and scheduling machinery with direct velocity regression on a closed form coupling path. Experiments on six real world benchmark datasets show that TabFlowM attains the best average rank in composite distributional fidelity, jointly accounting for marginal and pairwise divergence. It further achieves the strongest column-wise MLE on 5 out of 6 datasets. Across the UCI suite, TabFlowM also trains in markedly less time than the strongest diffusion baselines, avoiding the severe training-time scaling they exhibit on larger datasets. Finally, on a million-scale fraud dataset with class ratios exceeding 100:1, where unconditional fidelity can decouple from rare event predictive utility, TabFlowM achieves the strongest average AUC-PR while maintaining competitive fidelity and runtime. These findings suggest that, under an appropriate transport interface for mixed-type data, a minimalist flow matching generator can recover much of the benefit commonly associated with heavier diffusion models while substantially reducing computational and conceptual complexity.

Submission Type: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Qitian_Wu1

Submission Number: 8106

Loading