PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis
Abstract: While existing Aspect-based Sentiment Analysis (ABSA) has received extensive effort and advancement, there are still gaps in defining a more holistic research target seamlessly integrating multimodality, conversation context, fine-granularity, and also covering the changing sentiment dynamics as well as cognitive causal rationales. This paper bridges the gaps by introducing a multimodal conversational ABSA, where two novel subtasks are proposed: 1) Panoptic Sentiment Sextuple Extraction, panoramically recognizing holder, target, aspect, opinion, sentiment, rationale from multi-turn multi-party multimodal dialogue. 2) Sentiment Flipping Analysis, detecting the dynamic sentiment transformation throughout the conversation with the causal reasons. To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale (10,000 dialogues), multimodality (text, image, audio and video), multilingualism (English, Chinese and Spanish), multi-scenarios (over 100 domains), and covering both implicit&explicit sentiment elements. Further, to effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism. Extensive evaluations demonstrate the superiority of our methods over strong baselines, validating the efficacy of all our proposed methods. The work is expected to open up a new era for the ABSA community, and thus all our codes and data are open at https://PanoSent.github.io/.
Primary Subject Area: [Engagement] Emotional and Social Signals
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: This manuscript proposes a new benchmark regarding Panoptic Sentiment Sextuple Extraction and Sentiment Flipping Analysis for multimodal conversational aspect-based sentiment analysis. By harnessing data from audio, image, video, and text, our approach offers a comprehensive examination of sentiment dynamics within conversations. Given ACM MM's commitment to the forefront of multimedia technology research, particularly its Emotional and Social Signals sub-track that emphasizes the importance of emotion in multimodal content processing, our study is aptly positioned within the scope of the conference. We believe that our benchmark and innovative methodologies will advance understanding in sentiment analysis of multimodal conversations, making a substantial contribution to the themes explored at ACM MM.
Supplementary Material: zip
Submission Number: 1596
Loading