Rethinking the Diffusion Models for Missing Data Imputation: A Gradient Flow Perspective

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
Keywords: Missing Data Imputation, Gradient Flow, Reproducing Kernel Hilbert Space, Functional Optimization
TL;DR: We propose a novel, easy-to-implement, numerical tabular data imputation approach based on joint wasserstein gradient flow.
Abstract: Diffusion models have demonstrated competitive performance in missing data imputation (MDI) task. However, directly applying diffusion models to MDI produces suboptimal performance due to two primary defects. First, the sample diversity promoted by diffusion models hinders the accurate inference of missing values. Second, data masking reduces observable indices for model training, obstructing imputation performance. To address these challenges, we introduce $\underline{\text{N}}$egative $\underline{\text{E}}$ntropy-regularized $\underline{\text{W}}$asserstein gradient flow for $\underline{\text{Imp}}$utation (NewImp), enhancing diffusion models for MDI from a gradient flow perspective. To handle the first defect, we incorporate a negative entropy regularization term into the cost functional to suppress diversity and improve accuracy. To handle the second defect, we demonstrate that the imputation procedure of NewImp, induced by the conditional distribution-related cost functional, can equivalently be replaced by that induced by the joint distribution, thereby naturally eliminating the need for data masking. Extensive experiments validate the effectiveness of our method. Code is available at [https://github.com/JustusvLiebig/NewImp](https://github.com/JustusvLiebig/NewImp).
Primary Area: Probabilistic methods (for example: variational inference, Gaussian processes)
Submission Number: 1850
Loading