When Voices Speak Louder: Leveraging Audio Signals in Emotion-Cause Extraction via Large Multilingual Multimodal Indian Dialogue Datasets

Nishant Kumar, Srishti Gupta, Sourav Kumar Dandapat

Published: 01 Jan 2025, Last Modified: 19 Dec 2025IEEE Signal Processing LettersEveryoneRevisionsCC BY-SA 4.0

Abstract: Multimodal emotion and cause recognition have progressed significantly since their inception. Existing multimodal English datasets and newer research extending to languages like Chinese and Polish have been developed to identify conversation emotion-cause pairs. However, these datasets often lack cultural relevance, particularly for Indian languages, which remain underrepresented in the field. This paper aims to bridge this gap by introducing new datasets in India's two most spoken languages: Hindi and Bengali. Our work provides essential resources for more culturally and linguistically diverse emotion and cause recognition tasks, contributing to the broader goal of enhancing detection in multilingual, multimodal Indian contexts.

External IDs:doi:10.1109/lsp.2025.3627537