Abstract: Facilitation robots and agents play a significant role in enhancing the engagement and harmony level during social interactions. However, it is difficult for these intelligent systems to understand human states without humans wearing dedicated sensing devices to measure EEG or ECG responses, thus making implementations in the wild challenging. Non-verbal cues such as body language, gaze interaction, and voice activity could provide promising solutions because they implicitly represent the internal states of humans. This study explores the correlation between human states and nonverbal cues, employing a fractional factorial experimental design that considers factors like the facilitator type, group size, and stimulus type. Human annotation and software programs were used to annotate body language, gaze, and voice activity from the video and audio data. Time and frequency domain features were also computed from the EEG and ECG physiological responses. The MANOVA test suggests statistical significance for all nonverbal cues classes and experimental factors. Notably, facilitator type and upper body movement demonstrated high statistical significance. Such findings are expected to facilitate modeling of human state inference in the wild using multi-modal audiovisual data without humans having to wear additional sensors, and to complement the existing methods which are based only on facial key points or audio features.
External IDs:dblp:conf/hri/ChewJ24
Loading