MirrorTD: Constraint-Aware Diffusion Models for Mixed-Type EHR Time Series Generation

TMLR Paper8934 Authors

14 May 2026 (modified: 01 Jun 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The generation of synthetic electronic health records (EHRs) data is a critical enabler for ML in healthcare. However, it remains challenging because clinical time series are mixed-type (numerical and categorical), high-dimensional, temporally structured, and subject to constraints such as data validity and patient survival status. In response to these challenges, we propose `MirrorTD`, a multi-stage score-based diffusion framework that integrates mixed-type Gaussian and discrete diffusion processes with a mirror-mapping variational autoencoder to embed constraints. Specifically, we embed constrained indicators into a continuous latent space via the mirror mapping and utilize an efficient spatio-temporal attention mechanism to capture temporal dynamics and cross-feature dependencies. Experiments on three real-world ICU datasets show that our method produces realistic, diverse, and constraint-compliant synthetic EHRs, advancing synthetic time-series generation for critical-care cohorts.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Sylvain_Le_Corff1
Submission Number: 8934
Loading