Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets

Raj Ratn Pranesh, Ambesh Shekhar, Anish Kumar

18 Oct 2020 (modified: 21 Oct 2020)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone

Abstract: Social media platforms such as Twitter often provide firsthand news during the outbreak of a crisis. It is extremely essential to process these facts quickly to plan the response efforts for minimal loss. Processing this social media information poses multiple challenges such as parsing noisy messages containing both texts and images. Furthermore, these messages are diverse, from personal achievements and opinions to situational crises. Therefore, in this paper, we present an analysis of various multimodal feature fusion techniques to analyze and classify the disaster tweets into multiple crisis events via transfer learning. In our study, we utilized three image models- VGG19, ResNet-50 and AlexNet pre-trained on ImageNet dataset and three fine-tuned language models- BERT, ALBERT and RoBERTa to learn the visual and textual feature of the data and combine them to make predictions. We have presented a systematic analysis of multiple intra-modal as well as cross-modal fusion strategies and their effect over the performance of the multimodal disaster classification system. In our experiment, we used 8,242 disaster tweets each consisting of image and text data with five disaster event classes. The results show that the multimodal with transformer-attention mechanism and factorized bilinear pooling (FBP) for intra-modal and cross-modal feature fusion achieved the best performance.

0 Replies