AMTN: Attention-Enhanced Multimodal Temporal Network for Humor Detection

Yangyang Xu, Peng Zou, Rui Wang, Qi Li, Chengpeng Xu, Zhuoer Zhao, Xun Yang, Xiao Sun, Dan Guo, Meng Wang

Published: 2024, Last Modified: 22 May 2025MuSe@ACM Multimedia 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, we introduce the Attention-Enhanced Multimodal Temporal Network (AMTN) to address the MuSe 2024 Cross-Cultural Humor Detection Sub-Challenge (MuSe-Humor), which highlights the task of humor detection within a cross-cultural context, leveraging multimodal information for its accomplishment. Specifically, we employ the Temporal Convolutional Network (TCN) to capture the temporal dynamics within individual modalities' features. Following this, we apply attention mechanism to refine the integration of information across different modalities and temporal sequences. The integrated features are then used for humor detection. Furthermore, we investigate the effectiveness of an end-to-end approach for this challenge. Finally, a more robust outcome is achieved by aggregating multiple experimental results, which constitutes our final submission for the challenge. As a result, our solution achieves a remarkable AUC score of 0.8833 on the test dataset, outperforming the baseline by 0.0151 and securing 2nd place in the MuSe-Humor sub-challenge.