Learning from Fragmentary Multivariate Time Series Data with Scalable Numerical Embedding

Chun-Kai Huang; Yi-Hsien Hsieh; Che Lin; Tung-Hung Su; JH Kao

Learning from Fragmentary Multivariate Time Series Data with Scalable Numerical Embedding

Chun-Kai Huang, Yi-Hsien Hsieh, Che Lin, Tung-Hung Su, JH Kao

17 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: representation learning, multivariate time series data, missing value, model interpretability, transformer-based model

Abstract: The recent proliferation of transformer-based models in natural language processing and computer vision has significantly impacted fields involving multivariate time series (MTS) data. This research focuses on a different data type sourced from electronic health records (EHR). Unlike other MTS data, EHR exhibits a high prevalence of irregular missing values due to its asynchronous measurement nature, which may drastically harm the efficacy of the learning algorithms. To tackle this issue effectively, we propose a novel approach termed "SCAlable Numerical Embedding" ($\mathrm{SCANE}$), which treats each value as an independent token to enhance the flexibility of the interaction between variables. Moreover, we integrate the transformer encoder with $\mathrm{SCANE}$ (TranSCANE) to form a complete feature extractor for downstream tasks. TranSCANE’s attention module within its transformer encoder is specifically tailored for EHR data to circumvent the noise from irregular missing values adeptly. To further enhance the interpretability of TranSCANE, we propose the revised rollout attention that comprehensively computes attention weights across all transformer encoder stacks and neglects the dummy attention for missing values. This empowers us to gain insights into the inner workings of TranSCANE and improve model interpretability. The experimental results reinforce TranSCANE's efficacy, as it attains superior performance on three distinct EHR datasets with high missing rates. We believe that TranSCANE also holds the potential to extend the utility of transformer-based models into diverse domains with high missing rate MTS data.

Supplementary Material: zip

Primary Area: representation learning for computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 978

Loading