Keywords: missing value imputation, tabular data generation, diffusion models
TL;DR: In this work, ImpuGen achieves state-of-the-art performance for imputation and tabular data synthesis using two new task-aligned sampling strategies.
Abstract: Imputation of missing values and tabular data synthesis both rely on distribution modeling, but they pursue different goals. pointwise accuracy is required in imputation, whereas diversity and fidelity are crucial in generation. We present ImpuGen, a single conditional diffusion model that achieves both objectives. ImpuGen employs two efficient task-aligned sampling strategies. (i) A zero-start sampling, which yields accurate, deterministic imputations without multiple-sample averaging. (ii) A distribution-matching refinement (DMR), which randomly remasks columns with probability \(p\) and regenerates them to reduce distributional mismatch. Across nine public datasets, ImpuGen surpasses eleven imputation baselines—reducing MAE by up to 16\%—and matches state of the art on five generation evaluation metrics.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 22549
Loading