Deep multimodal learning for time series analysis in social computing: a survey

Chao Yang, Yakun Chen, Zihao Li, Xianzhi Wang, Kaize Shi, Lina Yao, Guandong Xu, Zhongwen Guo

Published: 01 Jan 2025, Last Modified: 18 Jul 2025Int. J. Multim. Inf. Retr. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Time series data, such as sound waves, bio-signals, and user trajectories, are prevalent in social application scenarios. While single-modal time series data often proves inadequate for addressing challenges in complicated environments, it necessitates integrating multiple modalities to understand real-world phenomena. Utilizing multimodal data improves deep learning systems’ effectiveness, generalizability, and robustness. This survey comprehensively reviews recent advancements, especially methodologies for general multimodal time series analysis and efforts in various social computing contexts, including autonomous driving, healthcare, audiovisual speech recognition, and gesture recognition. We highlight the key techniques of existing studies, namely modality selection, feature extraction, and information fusion and detail the solutions under various circumstances. Finally, we discuss the unresolved challenges and suggest potential future research directions. Our survey aims to provide researchers and industries insights into trends, behaviors, and preferences for performing multimodal time series analysis in social computing applications.