Keywords: Ambivalence, hesitancy, affective computing, emotion recognition in videos, multimodal, eHealth, behavioral change
TL;DR: We introduce new dataset for Ambivalence/Hesitancy recognition in videos with 300 participants and 1,274 videos. Data and code are publically available.
Abstract: Ambivalence and hesitancy (A/H), a closely related construct, is the primary reason why individuals delay, avoid, or abandon health behaviour changes. It is a subtle and conflicting emotion that sets a person in a state between positive and negative orientations, or between acceptance and refusal to do something. It manifests as a discord in affect between multiple modalities or within a modality, such as facial and vocal expressions, and body language.
Although experts can be trained to recognize A/H as done for in-person interactions, integrating them into digital health interventions is costly and less effective. Automatic A/H recognition is therefore critical for the personalization and cost-effectiveness of digital behaviour change interventions. However, no datasets currently exist for the design of machine learning models to recognize A/H.
This paper introduces the Behavioural Ambivalence/Hesitancy (BAH) dataset collected for multimodal recognition of A/H in videos. It contains 1,427 videos with a total duration of 10.60 hours, captured from 300 participants across Canada, answering predefined questions to elicit A/H.
It is intended to mirror real-world digital behaviour change interventions delivered online. BAH is annotated by three experts to provide timestamps that indicate where A/H occurs, and frame- and video-level annotations with A/H cues. Video transcripts,
cropped and aligned faces, and participant metadata are also provided. Since A and H manifest similarly in practice, we provide a binary annotation indicating the presence or absence of A/H.
Additionally, this paper includes benchmarking results using baseline models on BAH for frame- and video-level recognition, zero-shot prediction, and personalization with source-free domain adaptation methods. The limited performance highlights the need for adapted multimodal and spatio-temporal models for A/H recognition. Results obtained with specialized fusion methods are shown to assess the presence of conflict between modalities, and with temporal modelling for within-modality conflict are essential for more discriminant A/H recognition.
The data, code, and pretrained weights are publicly available: https://github.com/sbelharbi/bah-dataset.
Primary Area: datasets and benchmarks
Submission Number: 20025
Loading