Estimating the Semantic Density of Visual Media

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Image descriptions provide precious information for a myriad of visual media management tasks ranging from image classification to image search. The value of such curated collections comes from their diverse content and their accompanying extensive annotations. Such annotations are typically supplied by communities, where users (often volunteers) curate labels and/or descriptions of images. Supporting users in their quest to increase (overall) description completeness where possible is, therefore, of utmost importance. In this paper, we introduce the notion of visual semantic density, which we define as the amount of information necessary to describe an image comprehensively such that the image content can be accurately inferred from the description. Together with the already existing annotations, this measure can estimate the annotation completeness, helping to identify collection content with missing annotations. We conduct user experiments to understand how humans perceive visual semantic density in different image collections to identify suitable proxy measures for our notion of visual semantic density. We find that extensive image captions can serve as a proxy to calculate an image's semantic density. Furthermore, we implement a visual semantic density estimator capable of approximating the human perception of the measure. We evaluate the performance of this estimator on several image datasets, concluding that it is feasible to sort images automatically by their visual semantic density, thereby allowing for the efficient scheduling of annotation tasks. Consequently, we believe that the visual semantic density estimation process can be used as a completeness measure to give feedback to annotating users in diverse visual content ecosystems, such as Wikimedia Commons.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Engagement] Multimedia Search and Recommendation, [Engagement] Emotional and Social Signals
Relevance To Conference: The paper introduces the task of visual semantic density estimation, which tries to quantify the amount of semantic content in visual media, such as images. This measure can, for example, be used to estimate the completeness of image annotations. Since such annotations form the basis for many multimedia analysis, understanding, and retrieval tasks, mechanisms by which it can be improved will benefit the multimedia community as a whole, either directly or indirectly.
Supplementary Material: zip
Submission Number: 4902
Loading