Abstract: Human activity recognition (HAR) has long been an active research topic as it enables us to infer human behaviors and daily routines from sensor data collected on wearables or on sensors embedded in a pervasive sensing environment. In recent years, deep learning has been widely used in HAR for feature extraction and multimodal fusion, and has achieved promising performance on activity recognition. However, they often require a large number of labeled data for training. To directly tackle this challenge, this paper proposes SelfVis, a novel visualization-based self-supervised learning technique, which aims to extract effective features without the need of labeled data. To achieve this goal, it encodes time-series IMU sensor readings into images and then employs ResNet, a pre-trained, state-of-the-art convolutional neural network (CNN) as the backbone feature extractor. It leverages the fact that there exist multiple sensors often being used and uses sensor identifications that are generated automatically as a prediction target during the self-supervised learning process. With these two, SelfVis has achieved high activity recognition accuracy even when only a small number of labeled data are available; that is, with only 1% training data, SelfVis has demonstrated the ability to achieve higher performance than state-of-the-art techniques by up to 0.46 in macro F1-scores.
External IDs:dblp:journals/tetc/JiangY25
Loading