Visualizable or Non-visualizable? Exploring the Visualizability of Concepts in Multi-modal Knowledge Graph

Published: 01 Jan 2022, Last Modified: 11 Feb 2025DASFAA (1) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: An important task in image-based Multi-modal Knowledge Graph construction is grounding concepts to their corresponding images. However, existing research omits the intrinsic properties of different concepts. Specifically, there are some concepts that can not be characterized visually, such as mind, texture, session cookie and so on. In this work, we define concepts like these as non-visualizable concepts (NVC) and the others like dog that have clear and specific visual representations as visualizable concepts (VC). And, we propose a new task of distinguishing VCs from NVCs, which has rarely been tackled by the existing efforts. To address this problem, we propose a multi-modal classification model combining concept-related features from both texts and images. Due to the lack of enough training samples especially for NVC, we select concepts in ImageNet as the instances for VC, and also propose a webly-supervised method to get a small set of instances for NVC. Based on the small training set, we modify the basic two-step positive-unlabeled learning strategy to train the model. Extensive evaluations demonstrate that our model significantly outperforms a variety of baseline approaches.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview