Real-Time Multimodal Emotion Recognition in Conversation for Multi-Party Interactions

Julien Saunier, Alexandre Pauchet, Sandratra Rasendrasoa, Sébastien Adam

Published: 06 Nov 2022, Last Modified: 11 Oct 2024OpenReview Archive Direct UploadEveryoneCC BY-NC-ND 4.0

Abstract: In order to improve multi-party social interaction with artificial companions such as robots or virtual agents, real-time Emotion Recognition in Conversation (ERC) is required. In this context, ERC is a challenging task which involves multiple challenges, such as processing multimodal data over time, taking into account the multi-party context with any number of participants, understanding implied relevant commonsense knowledge during interaction and taking into account each participant’s emotional attitude. To deal with the aforementioned challenges, we design a multimodal off-the-shelf model that meets the requirements of real-life scenarios, specifically dyadic and multi-party interactions. We propose a Knowledge Aware Multi-Headed Network that integrates various sources including the dialog history and commonsense knowledge about the speaker and other participants. The weights of these pieces of information are modulated using a multi-head attention mechanism. The proposed model is learnt in a Multi-Task Learning framework which combines the ERC task with a Dialogue Act (DA) recognition task and an Emotion Shift (ES) detection task through a joint learning strategy. Our proposition obtains competitive and stable results on several benchmark datasets that vary in number of participants and length of conversations, and outperforms the state-of-the-art on one of these datasets. The importance of DA and ES prediction in determining the speaker’s current emotional state is investigated.