MTChat: A Multimodal Time-Aware Dataset and Framework for Conversation

MTChat: A Multimodal Time-Aware Dataset and Framework for Conversation

ACL ARR 2024 June Submission4344 Authors

16 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Understanding temporal dynamics is critical for applications ranging from conversations and multimedia content analysis to decision-making. However, time-aware datasets, particularly for conversations, are still limited, which narrows their scope and diminishes their complexity. To overcome these limitations, we introduce MTChat, a multimodal time-aware dialogue dataset that integrates linguistic, visual, and temporal elements in dialogue and persona memory. Based on MTChat, we design two time-sensitive tasks, Temporal Next Response Prediction (TNRP) and Temporal Grounding Memory Prediction (TGMP), utilizing implicit temporal cues and dynamic aspects to challenge model's temporal awareness. Furthermore, we present an innovative framework with an adaptive temporal module to integrate these multimodal streams and build interconnections effectively. The experimental results confirm that novel challenges of MTChat and effectiveness of our framework in multimodal time-sensitive scenarios. The codes are publicly available at \href{https://anonymous.4open.science/r/MTChat-F83B/.} and MTChat is submitted to ARR system.

Paper Type: Long

Research Area: Dialogue and Interactive Systems

Research Area Keywords: time-sensitive, conversation, multimodal

Languages Studied: English

Submission Number: 4344

Loading