Automatic Detection of Subjective, Annotated and Physiological Stress Responses from Video Data

Published: 01 Jan 2022, Last Modified: 04 Apr 2025ACII 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Machine-learning-based stress detection systems differ with respect to the ground truth used for training the algorithms. It is unclear how models trained on different facets of the stress reaction (e.g., biological, psychological, social) can be compared, interpreted and applied. In this study, we investigate the influence of the stress label on the performance of machine learning models trained on either vocal characteristics or facial expressions extracted from videos. We collected videos from 40 male participants while being exposed to the Trier Social Stress Test (TSST) and assessed self-reported, live observed, video-annotated and neuro-endocrinological stress levels. We train three standard machine learning models to separately predict different stress labels using either voice or facial cues. Analyzing the relationships of different stress facets we found that observers' annotations were significantly positively associated (live vs. video annotated, $\boldsymbol{\rho}_{\mathbf{s}}\ =\ . 53$ ). Similarly, the neuro-endocrinological stress indices correlated with each other (cortisol vs. sAA, $\boldsymbol{\rho}_{\mathbf{s}}$ =.39). Machine learning experiments resulted in predictions that were positively associated with panel-annotated stress levels showing significantly stronger correlations in voice-based models $(\boldsymbol{\rho}_{\mathbf{s}=}.54\ \mathbf{v}\mathbf{s}. \boldsymbol{\rho}_{\mathbf{S}}=.30)$ . Predictions of self-reported stress were positively related to ground truth values for face-based $(\boldsymbol{\rho}_{\mathbf{s}}$ =.24) but not for voice-based models. There was no evidence for successful predictions of video-annotations or endocrinological stress levels in both settings. We provide evidence that machine learning models trained on different stress assessments perform differently and should be interpreted and applied accordingly. Implications and recommendations for future work on video-based stress detection are discussed.
Loading