CharacterQA: A Corpus for Multimodal Character Conversational Movie Question Answering

ACL ARR 2024 June Submission2019 Authors

15 Jun 2024 (modified: 04 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The rapid advancement of Large Language Models has sparked extensive exploration of their applications across various fields. Among them, the personalized conversation based on characters in movies is an attractive research area. To achieve such comprehensive conversations, the integration of extensive multimodal information, notably visual content alongside textual data, is crucial. This necessity underlines the significance of multimodal insights for enriching personalized conversations, thereby further emphasizing the urgent need for a sophisticated multimodal character conversational dataset. To this end, we introduce CharacterQA, a novel video question-answering (QA) dataset for multimodal character conversation in movies. The dataset consists of 101 selected Chinese movies, each of which is annotated with the main character profiles, the character information of the scripted conversations and their timestamps. Furthermore, a set of questions from various designed tasks and their detailed answers are annotated. Most of those questions require taking into account visual signals for logical comprehension of movie characters and plots. Subsequently, we adopt an advanced multimodal large language model MovieGPT to evaluate the CharacterQA dataset. The results yield insightful findings that are expected to drive further development of multimodal large language models in the character conversation field.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Multimodality and Language Grounding to Vision, Robotics and Beyond;Question Answering;Resources and Evaluation
Contribution Types: Data resources
Languages Studied: English,Chinese
Submission Number: 2019
Loading