Cross-dialogue scene interactive knowledge enhancement for multimodal conversation emotion analysis

Published: 2025, Last Modified: 15 Jan 2026Int. J. Mach. Learn. Cybern. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Current research on multimodal conversation emotion analysis primarily focuses on modeling the context within a dialogue and employing attention-based multimodal fusion. However, it has not effectively explored cross-dialogue associations, which leads to an underutilization of complementary semantic information from similar dialogue scenes. This paper addresses these gaps by proposing the CrOss-dialogue Scene Interactive Knowledge Enhancement model (COSIKE), which enhances dialogue modeling by integrating cross-dialogue cross-modal scenes and commonsense knowledge information. In particular, (1) COSIKE constructs global and local scene interaction graphs based on enriched scene descriptions generated by large language models to explore inter- and intra-dialogue associations. (2) An overlapping graph-based multi-scene interaction learning is proposed for scene information transfer. (3) Cross-modal commonsense distillation is employed for knowledge enhancement. Extensive experiments on the MELD and M3ED datasets demonstrate that COSIKE outperforms state-of-the-art methods.
Loading