Abstract: Groups of sensors collecting time-series data in a variety of modalities are widely used for monitoring humans, environments, and equipment. Datasets with multimodal sensor data pose several challenges not present in many image or language datasets; most notably, there are few de-facto standards on how the data should be organized and packaged. In this work, we present a framework inspired by the LLVM Compiler architecture that streamlines sensor data processing for machine learning applications. Specifically, we define standardized, intermediate representations that can be easily transformed for input to data preprocessing and model training steps. By standardizing and preserving time and subject information, our method supports robust label verification and multiple means of subject-independent cross-validation. We demonstrate the validity of our framework using seven different datasets, all containing time-series data and representing a variety of sensors, modalities, domains, and collection environments.
External IDs:dblp:conf/petra/HinkleM23
Loading