MTChat: A Multimodal Time-Aware Dataset and Framework for Conversation

ACL ARR 2024 June Submission4344 Authors

16 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Understanding temporal dynamics is critical for applications ranging from conversations and multimedia content analysis to decision-making. However, time-aware datasets, particularly for conversations, are still limited, which narrows their scope and diminishes their complexity. To overcome these limitations, we introduce MTChat, a multimodal time-aware dialogue dataset that integrates linguistic, visual, and temporal elements in dialogue and persona memory. Based on MTChat, we design two time-sensitive tasks, Temporal Next Response Prediction (TNRP) and Temporal Grounding Memory Prediction (TGMP), utilizing implicit temporal cues and dynamic aspects to challenge model's temporal awareness. Furthermore, we present an innovative framework with an adaptive temporal module to integrate these multimodal streams and build interconnections effectively. The experimental results confirm that novel challenges of MTChat and effectiveness of our framework in multimodal time-sensitive scenarios. The codes are publicly available at \href{https://anonymous.4open.science/r/MTChat-F83B/.} and MTChat is submitted to ARR system.
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: time-sensitive, conversation, multimodal
Languages Studied: English
Submission Number: 4344
Loading