TA-CNN: A Unified Network for Human Behavior Analysis in Multi-Person Conversations

Fuyan Ma, Ziyu Ma, Bin Sun, Shutao Li

Published: 01 Jan 2022, Last Modified: 10 Mar 2025ACM Multimedia 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Human behavior analysis in multi-person conversations has been one of the most important research issues for natural human-robot interaction. However, previous datasets and studies mainly focus on single-person behavior analysis, therefore, can hardly be generalized in real-world application scenarios. Fortunately, the MultiMediate'22 Challenge provides various video clips of multi-party conversations. In this paper, we present a unified network named TA-CNN for both sub-challenges. Our TA-CNN can not only model the spatio-temporal dependencies for eye contact detection, but also capture the group-level discriminative features for multi-label next speaker prediction. We empirically evaluate the performance of our method on the officially provided datasets. Our method achieves the state-of-the-art result of 0.7261 for eye contact detection in terms of accuracy and the UAR of 0.5965 for next speaker prediction on the corresponding test sets.