Multimodal Word Sense Disambiguation in Creative Practice

Manuel Ladron de Guevara, Christopher George, Akshat Gupta, Daragh Byrne, Ramesh Krishnamurti

Published: 01 Jan 2020, Last Modified: 10 May 2023ICMLA 2020Readers: Everyone

Abstract: Language is ambiguous; many terms and expressions can convey the same idea. This is especially true in creative practice, where ideas and design intents are highly subjective. We present a dataset-Ambiguous Descriptions of Art Images (ADARI)-of contemporary workpieces, which aims to provide a foundational resource for subjective image description and multimodal word disambiguation in the context of creative practice. The dataset contains a total of 240k images labeled with 260k descriptive sentences. It is additionally organized into sub-domains of architecture, art, design, fashion, furniture, product design and technology. In subjective image description, labels do not necessarily correspond to well-defined entities i.e. cars, quantitative attributes such as the color red, or actions like playing. For example, the ambiguous label dynamic is a qualitative attribute of an extensive amount of objects and thus, the data's variance is high. To understand this complexity, we analyze the ambiguity and relevance of text with respect to images using the state-of-the-art pre-trained BERT model for sentence classification. We provide a baseline for multi-label classification tasks and demonstrate the potential of multimodal approaches for understanding ambiguity in design intentions. We hope that ADARI dataset and baselines constitute a first step towards subjective label classification.

0 Replies