Improving Emotion and Intent Understanding in Multimodal Conversations With Progressive Interaction

Li Zheng, Tengyue Song, Yuzhe Ding, Xiaorui Wu, Fei Li, Dongdong Xie, Jinbo Li, Chong Teng, Donghong Ji

Published: 2026, Last Modified: 27 May 2026IEEE Trans. Affect. Comput. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Emotion and intent joint understanding in multimodal conversations (MC-EIU) aims to model the semantic dependencies among multimodal conversations while inferring emotion and intent information. Despite making progress, existing methods overlook the differential contributions of modalities and rely on single-round interactions for emotion and intent recognition, resulting in suboptimal model understanding performance. To overcome these limitations, we propose MEI-Pro, a novel progressive interaction and adaptive weight fusion based multimodal joint understanding of emotion and intent framework. We first design a hierarchical denoising module to effectively remove noise and redundant information from multimodal data. Then, we propose an adaptive weight fusion mechanism that dynamically fuses multimodal features by taking the true classification probabilities of each modality as their respective contributions, thus enhancing the fusion process. Additionally, we present a progressive dual task interaction module to capture the deep seated interactions between emotion and intent through a step-by-step multi-round iteration. Experiments on the benchmark MC-EIU bilingual dataset demonstrate that our MEI-Pro framework significantly outperforms state-of-the-art baselines in both emotion and intent tasks. Specifically, on the English dataset, the F1-scores of the multimodal emotion and intent understanding tasks have increased by 6.12% and 7.25% respectively.

External IDs:dblp:journals/taffco/ZhengSDWLXLTJ26