Keywords: Time Series Forecasting, Frame-wise Labels, Humanoid Robots, Multimodality, Egocentric Vision
TL;DR: We propose a new challenge in the TSF area and a multimodal method for this task together with two datasets collected as benchmarks.
Abstract: Deep learning models have been increasingly applied to Time Series Forecasting (TSF) in recent years. Transformer-based and MLP-based models have both been used effectively on many real-world TSF regression benchmarks, and there is ongoing debate as to which family of methods is best. While these benchmarks have drawn much attention, it is also worth noting that many current datasets and methods assume approximate periodicity in the time series. In this work, we focus on a new TSF task without periodicity: anticipating falls during humanoid locomotion, on the basis of egocentric vision and proprioception. When the locomotion trajectories are sufficiently diverse, periodicity is violated. We contribute two new benchmark datasets (one from simulation, one from real hardware), showing that periodicity is violated and recent deep TSF methods struggle on these benchmarks. We also propose a novel deep learning architecture that exploits both endogenous and exogenous variables and a training process that rigorously enforces i.i.d sampling of training examples. Our results show statistically significant improvement over prior art in multiple experimental conditions, by 12.73\% or more on the real data and 10.40\% or more on the simulation data. Code and datasets will be available upon acceptance.
Primary Area: learning on time series and dynamical systems
Submission Number: 3190
Loading