Keywords: Time Series Imputation, Score-based Generative Models, Latent Diffusion
TL;DR: Hierarchical score-based generative model that corrects diffusion-model bias for accurate probabilistic time-series imputation.
Abstract: Missing data remains a key challenge in multivariate time series modeling, often degrading downstream performance. Recent score-based generative models show strong potential for high-quality imputations, yet most ignore original missing data during training, since ground truth is unavailable, resulting in biased score estimation. We theoretically analyze the effect of missingness on score-based modeling under the denoising diffusion probabilistic model (DDPM) framework. Our findings reveal that ignoring original missing patterns—especially under high missing rates or strong inter-variable correlations—can significantly distort the learned score function even at non-missing points. To overcome this, we propose the Hierarchical Score-Based Generative Model (HSGM) for probabilistic time series imputation. HSGM integrates latent-space and observation-space diffusion in a layer-wise refinement framework grounded in the chain rule of probability. A pretrained Variational Autoencoder (VAE) with normalizing flows captures complex latent distributions, while a continuous-time variational diffusion (VPSDE) operates in latent space. A cross-attention mechanism between the original and denoised latent states enhances the fidelity and resolution of the generative outputs, while an observation-space diffusion module further refines the final imputations. Experiments on four benchmark datasets show that HSGM achieves the best accurate imputations with tighter uncertainty estimates than existing methods, while effectively correcting score function bias, establishing a new state of the art in time series imputation.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 6186
Loading