TL;DR: We build a computational pipeline to uncover and analyze socially meaningful linguistic variation in the choice of meme templates on Reddit.
Abstract: Much work in the space of NLP has used computational methods to explore sociolinguistic variation in text. In this paper, we argue that memes, as multimodal forms of language comprised of visual templates and text, also exhibit meaningful social variation. We construct a computational pipeline to cluster individual instances of memes into templates and semantic variables, taking advantage of their multimodal structure in doing so. We apply this method to a large collection of meme images from Reddit and make available the resulting \textsc{SemanticMemes} dataset of 3.8M images clustered by their semantic function. We use these clusters to analyze linguistic variation in memes, discovering not only that socially meaningful variation in meme usage exists between subreddits, but that patterns of meme innovation and acculturation within these communities align with previous findings on written language.
Paper Type: long
Research Area: Computational Social Science and Cultural Analytics
Contribution Types: Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
0 Replies
Loading