Abstract: Remote photoplethysmography (rPPG) measurement aims to estimate physiological signals by analyzing subtle skin color changes induced by heartbeats in facial videos. Existing methods primarily rely on the fundamental video frame features or vanilla facial ROI (region of interest) features. Recognizing the varying light absorption and reactions of different facial regions over time, we adopt a new perspective to conduct a more fine-grained exploration of the key clues present in different facial regions within each frame and across temporal frames. Concretely, we propose a novel clustering-driven remote physiological measurement framework called Cluster-Phys, which employs a facial ROI prototypical clustering module to adaptively cluster the representative facial ROI features as facial prototypes and then update facial prototypes with highly semantic correlated base ROI features. In this way, our approach can mine facial clues from a more compact and informative prototype level rather than the conventional video/ROI level. Furthermore, we also propose a spatial-temporal prototype interaction module to learn facial prototype correlation from both spatial (across prototypes) and temporal (within prototype) perspectives. Extensive experiments are conducted on both intra-dataset and cross-dataset tests. The results show that our Cluster-Phys achieves significant performance improvement with less computation consumption. The source code will be available at https://github.com/VUT-HFUT/ClusterPhys.
Primary Subject Area: [Engagement] Emotional and Social Signals
Secondary Subject Area: [Engagement] Emotional and Social Signals, [Content] Media Interpretation
Relevance To Conference: This paper focuses on the task of video-based rPPG estimation, which is an important topic in multimedia mediation. In this paper, we propose a novel cluster-based remote physiological measurement framework called Cluster-Phys, which adaptively selects the important facial ROI features as cluster centers and assigns appropriate base ROI features to the cluster centers to build compact and representative prototypes. In addition, we also propose a spatial-temporal prototype interaction module, which contributes to supplementing spatial-temporal contextual information of the facial prototypes. Finally, the compact facial prototypes from all frames are used for rPPG estimation. Extensive experiments are conducted on both intra-dataset and cross-dataset tests. The results show that Cluster-Phys achieves significant performance improvement and less computation consumption. The advancements presented in this paper show potential for a variety of applications, encompassing physiological signal interpretation and social signal understanding, thereby making contributions to the multimedia research community.
Supplementary Material: zip
Submission Number: 1444
Loading