AMTN: Attention-Enhanced Multimodal Temporal Network for Humor Detection

Published: 01 Jan 2024, Last Modified: 22 May 2025MuSe@ACM Multimedia 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this paper, we introduce the Attention-Enhanced Multimodal Temporal Network (AMTN) to address the MuSe 2024 Cross-Cultural Humor Detection Sub-Challenge (MuSe-Humor), which highlights the task of humor detection within a cross-cultural context, leveraging multimodal information for its accomplishment. Specifically, we employ the Temporal Convolutional Network (TCN) to capture the temporal dynamics within individual modalities' features. Following this, we apply attention mechanism to refine the integration of information across different modalities and temporal sequences. The integrated features are then used for humor detection. Furthermore, we investigate the effectiveness of an end-to-end approach for this challenge. Finally, a more robust outcome is achieved by aggregating multiple experimental results, which constitutes our final submission for the challenge. As a result, our solution achieves a remarkable AUC score of 0.8833 on the test dataset, outperforming the baseline by 0.0151 and securing 2nd place in the MuSe-Humor sub-challenge.
Loading