Keywords: multivariate time series data, representation learning, imputation-free, missing value
Abstract: Irregular and asynchronous sampled multivariate time series (MTS) data is often filled with missing values. Most existing methods embed features according to timestamp, requiring imputing missing values. However, imputed values can drastically differ from real values, resulting in inaccurate predictions made based on imputation. To address the issue, we propose a novel concept, “each value as a token (EVAT),” treating each feature value as an independent token, which allows for bypassing imputing missing values. To realize EVAT, we propose scalable numerical embedding, which learns to embed each feature value by automatically discovering the relationship among features. We integrate the proposed embedding method with the Transformer Encoder, yielding the Scalable nUMerical eMbeddIng Transformer (SUMMIT), which can produce accurate predictions given MTS with missing values. We induct experiments on three distinct electronic health record (EHR) datasets with high missing rates. The experimental results verify SUMMIT's efficacy, as it attains superior performance than other models that need imputation.
Submission Number: 4
Loading