The visual side of knowledge: the role of images in Wikipedia and Wikidata

Published: 05 Feb 2025, Last Modified: 23 Apr 2025Submitted to WD&REveryoneRevisionsBibTeXCC BY 4.0
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Keywords: images, visual knowledge, wikimedia commons, wikidata, most viewed images
TL;DR: What are the most viewed images across Wikimedia projects, and how are they used on Wikidata?
Abstract: This proposal addresses a critical yet underexplored aspect of Wikimedia projects: the visual representation of knowledge. While extensive research has examined the textual content and its technical and social dimensions, the study of images as representations of knowledge remains significantly less developed. Often treated as supplementary information, images however play a crucial role in conveying information and shaping the representation and understanding of concepts. This is true not only for photographs and digitized artifacts, such as portraits, but also for composited images like maps, charts, and diagrams. This study aims to bridge this gap by identifying the most viewed images on Wikimedia Commons and analyzing their usage across Wikipedia and Wikidata. Key questions include: Which images are used to represent specific concepts? Which images are designated as the primary representation of Wikidata items through the image (P18) property? And which images have the greatest visual impact across Wikimedia projects? Methodologically, the study is structured as follows: The first step involves identifying the “most seen” images. Unlike Wikimedia pages, it is challenging to determine precisely when an image is “viewed,” as no publicly accessible data directly addresses this. Instead, the study uses a proxy metric: mediacounts, which measure the number of times an image is delivered to a client computer, either as a thumbnail or in high resolution. While this metric does not fully account for technical factors like preloading by client browsers, it serves as a reliable approximation. By collecting and aggregating mediacount data over the past year, the study compiles a list of the 10,000 most requested images, both as thumbnails and in high-quality formats. Images are then classified based on their content (e.g., portraits, insignia, diagrams, maps). Finally, each image’s usage on Wikidata is analyzed, focusing on which properties link them. The study’s results highlight how the most viewed images play an identity-defining role, providing unique representations for various concepts. This role is highly correlated with their usage on Wikidata. Images such as portraits, flags, insignia, and chemical representations feature prominently among the most viewed. These findings underscore the critical role of Wikidata in elevating certain images as primary representations of items, often facilitated through their automatic inclusion in templates.
Format: Paper (20 minutes presentation)
Submission Number: 15
Loading