Keywords: health timeseries, crossmodal diffusion, latent alignment
Abstract: Paired structured time series from heterogeneous health sensors often observe the same evolving physiological or biomechanical state through different measurement channels. We study how this underlying structure can be used to improve bidirectional conditional diffusion models, $p_\theta(X\mid Y)$ and $p_\phi(Y\mid X)$, for such paired data. We introduce \method, a lightweight training objective that aligns local encoder neighborhoods at matched diffusion steps through a windowed sequence-contrastive loss and a covariance-matching loss. On paired locomotion signals and canonical dynamical systems, stepwise alignment improves cross-modal reconstruction fidelity, distributional similarity, and downstream representation quality over the same diffusion backbone trained without alignment. These results suggest that diffusion-step local alignment is a useful inductive bias for structured health time series with shared underlying dynamics.
Submission Number: 125
Loading