Keywords: Causal, Multimodal, Robust Time Series Estimation, Sensor
Abstract: Many systems require real-time fusion of multi-sensor streams to produce causal estimates that drive online decisions. These systems must distill information across sensors while contending with missing and degraded measurements. As the number of sensors grows, both observable dropouts and latent degradation become more likely, making multi-sensor, multi-task processing brittle for conventional sequential models. We propose two plug-in modules that attach to any unidirectional backbone (e.g., LSTM or causal Transformer): (i) Subchannel Hierarchical Input Embedding (SHIE) forms channel-level embeddings from fine-grained subchannels so that degraded values perturb only a local slice of the representation; (ii) Repetitive Cross-Modal Fusion Transformer (RCFT) performs iterative sensor-wise (cross-modal) attention at each time step, fusing concurrent measurements across sensors. Both modules support many-to-many estimation and are domain-agnostic with respect to loss functions and input/output shapes. We augment vanilla LSTM and Transformer backbones with SHIE and RCFT and evaluate on four multi-sensor datasets: electric grid state estimation, physical activity monitoring, room occupancy prediction, and cognitive load estimation. Across datasets, the augmented models outperform their baselines and remain accurate as missing-data rates rise far beyond those seen in training. Ablations isolate the contribution of each module, and the combined approach improves robustness without relying on separate imputation.
Supplementary Material: pdf
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 9871
Loading