Modelling Complex Tabular Datasets with a Mixture of Diverse Generative Models.

Modelling Complex Tabular Datasets with a Mixture of Diverse Generative Models.

TMLR Paper7435 Authors

10 Feb 2026 (modified: 16 May 2026)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Generative models are widely used, yet they often struggle to capture the multi-modal structure of complex tabular datasets. We address this challenge by introducing a novel framework that employs mixtures of diverse generators, each specialized to different regions of the data space. Our method proceeds in two stages: first, generators are assigned to data clusters via a compute-efficient bandit-based allocation strategy; second, cluster assignments are refined through an iterative procedure inspired by the Expectation–Maximization (EM) framework. Crucially, our approach is designed for settings where the generators’ likelihoods are intractable and only generated data samples are accessible. We provide theoretical guarantees by establishing convergence rates of the mixture distribution under approxi- mate cluster identification. Empirical evaluations on both synthetic and real-world tabular datasets demonstrate that our approach produces high-quality synthetic data, validating its effectiveness in challenging generative modeling tasks.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Farzan_Farnia1

Submission Number: 7435

Loading