timeseriesflattener: A Python package for summarizing features from (medical) time series

Martin Bernstoff, Kenneth Enevoldsen, Jakob Damgaard, Andreas Danielsen, Lasse Hansen

07 Jun 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: Time series from e.g. electronic health records often have a large number of variables that are sampled at irregular and differing intervals. Before this type of data can be used for prediction modelling with machine learning methods such as logistic regression or XGBoost (Chen & Guestrin, 2016), the data needs to be reshaped. In essence, the time series need to be flattened so that each prediction time is represented by a vector of predefined length. This vector should hold the set of predictor values and an outcome value. These predictor values can be constructed by aggregating the preceding values in the time series within a certain time window. This process of flattening the data lays the foundation for further analyses and requires handling a number of tasks such as 1) how to deal with missing values, 2) which value to use if none fall within the prediction window, 3) how to handle variables measured multiple times within the chosen time window, and 4) how to handle predictors that attempt to look further back than the start of the dataset. timeseriesflattener aims to simplify this process by providing an easy-to-use and fullyspecified pipeline for flattening complex time series. timeseriesflattener implements all the functionality required for aggregating features in specific time windows, grouped by e.g. patient IDs, in a computationally efficient manner. The package is currently used for feature extraction from electronic health records in studies based on the Psychiatric Clinical Outcome Prediction Cohort (PSYCOP) projects (Hansen et al., 2021).

0 Replies