Abstract: Recent advances in deep learning have made it increasingly feasible to estimate heart rate (HR) remotely in smart environments by analyzing videos. However, a notable limitation of deep learning methods is their heavy reliance on extensive sets of labeled data for effective training. To address this issue, self-supervised learning has emerged as a promising avenue. Building on this, we introduce a solution that utilizes self-supervised contrastive learning for the estimation of remote photoplethysmography (PPG) and HR monitoring, thereby reducing the dependence on labeled data and enhancing performance. We propose the use of three spatial and three temporal augmentations for training an encoder through a contrastive framework, followed by utilizing the late-intermediate embeddings of the encoder for remote PPG and HR estimation. Our experiments on two publicly available data sets showcase the improvement of our proposed approach over several related works as well as supervised learning baselines, as our results approach the state of the art. We also perform thorough experiments to showcase the effects of using different design choices, such as the video representation learning method, the augmentations used in the pretraining stage, and others. We also demonstrate the robustness of our proposed method over the supervised learning approaches on reduced amounts of labeled data.
Loading