Missing Data Imputation for Large-Scale Longitudinal Physical Activity Data

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: missing data, time series, imputation, wearable, physical activity, large-scale, novel cohort, self-attention model, sparse
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We present a novel cohort and a sparse self-attention model for the missing value imputation problem on large-scale longitudinal physical activity data.
Abstract: Missing data is ubiquitous in wearable device data, which stems from the combination of user errors and hardware issues, hindering researchers who seek to monitor users' physical activities to understand health related behaviors and perform appropriate interventions. All of Us dataset collects one of the largest longitudinal physical activity data in the world. However, due to the remarkable variability of missingness patterns, only few works leverage it, which loses the extremely valuable potential to deliver vital transformative health impacts. In this work, we consider the problem of imputing missing step counts in the large-scale longitudinal physical activity data. Thus, we explore the All of Us dataset and extract a novel cohort of 100 qualified participants with more than 3 million step count instances from it. To address the issue of missingness, we introduce a sparse self-attention model which captures both absolute and relative time information within the local context window around the missing hourly block. Our results show (1) the curated cohort is subject to the variability of both activity and missingness patterns which is challenging to model, (2) our model outperforms a carefully-crafted set of baseline methods with the statistical significance, solidifying its position as a foundation model which could be used in fine-tuning approaches for the downstream tasks. Hopefully our filling method can benefit the further research by making such a large scale physical activity dataset easier to use.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6588
Loading