Abstract: Recent research has shed light on the capabilities of Large Multimodal Models (LMMs) across various general vision and language tasks. The performance of LMMs in specialized domains, such as social media, which integrates text, images, videos, and sometimes audio, remains an area of active interest. Effective analysis of such content requires models to interpret the complex interactions between different communication modalities and their influence on the conveyed message. This article explores GPT-4V(ision)’s performance in social multimedia analysis. We evaluate GPT-4V across five representative tasks: sentiment analysis, hate speech detection, fake news identification, demographic inference, and political ideology detection. Our approach includes a preliminary quantitative analysis for each task using existing benchmark datasets, followed by a review of the results and a selection of qualitative samples to demonstrate GPT-4V’s performance in multimodal social media content analysis. GPT-4V shows effectiveness in these tasks, exhibiting capabilities like joint image–text understanding, contextual and cultural awareness, and commonsense knowledge application. However, challenges persist, including struggles with multilingual social multimedia comprehension and difficulty in adapting to the latest social media trends. It also sometimes generates incorrect information about evolving knowledge of celebrities and politicians. This preliminary study aims to inform further research across disciplines, particularly in computational social science and social media studies. The findings highlight the potential of LMMs to enhance our understanding of social media content and its users through multimodal analysis. All images and prompts used in this study will be available at https://github.com/VIStA-H/GPT-4V_Social_Media.
External IDs:dblp:journals/tist/LyuHZYMPYWL25
Loading