HELIX: Hybrid Encoding with Learnable Identity and Cross-dimensional Synthesis for Time Series Imputation

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 spotlightEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Giving each time-series variable a learnable identity embedding lets a Transformer impute missing values more accurately by maintaining stable cross-feature associations even when observed data is severely incomplete.
Abstract: Time series imputation benefits from leveraging cross-feature correlations, yet existing attention-based methods re-discover feature relationships at each layer, lacking persistent anchors to maintain consistent representations. To address this, we propose HELIX, which assigns each feature a learnable feature identity, a persistent embedding that captures intrinsic semantic properties throughout the network. Unlike graph-based methods that rely on predefined topology and assume homogeneous spatial relationships, HELIX learns arbitrary feature dependencies end-to-end from temporal co-variation, naturally handling datasets where features mix spatial locations with semantic variables. Integrated with hybrid temporal-feature attention, HELIX achieves the state-of-the-art performance, surpassing all 16 baselines on 5 public datasets across 21 experimental settings in our evaluation. Furthermore, our mechanistic analysis reveals that HELIX aligns learned feature identities and dependencies with latent physical and semantic structure progressively across layers, demonstrating that it more effectively translates cross-feature structure into imputation accuracy.
Lay Summary: Sensors and monitoring devices frequently produce incomplete records. A hospital monitor may lose a patient's blood pressure reading for a few minutes; an air quality station may go offline during a dust storm. Filling in these gaps accurately matters because many downstream decisions, from clinical alerts to pollution forecasts, depend on having complete data. Most existing approaches treat every variable the same way and try to figure out relationships between variables from scratch at every step of processing. When too many readings are missing at once, there is simply not enough information left for the model to tell which variables should inform which. Our method, HELIX, takes a different route. It assigns each variable a small, learnable tag that acts like a name badge: even when a variable's value is entirely absent, the model still knows what that variable is and which other variables tend to move with it. These tags are not designed by hand; they are learned automatically from the data during training. On an air quality dataset, for example, the model discovered that nearby stations should have similar tags, purely from the fact that their readings co-vary, without ever seeing a map. We tested HELIX on five public datasets spanning clinical monitoring, air quality, electricity, and traffic, under a wide range of missingness scenarios. It ranked first in all 21 experimental settings against 16 baselines. Beyond accuracy, we find that the learned tags recover meaningful structure, grouping geographically close stations together and clustering clinically related vital signs, suggesting that the model captures genuine domain relationships rather than superficial statistical shortcuts.
Link To Code: https://github.com/milaogou/HELIX
Primary Area: Deep Learning->Sequential Models, Time series
Keywords: Time Series, Imputation, Missing Data, Attention Model, Deep Learning
Originally Submitted PDF: pdf
Submission Number: 31744
Loading