Visual context embeddings for zero-shot recognition

Published: 2022, Last Modified: 13 Nov 2024SAC 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Existing word-embeddings have performed well in various downstream tasks, but there may be a bias towards the text domain because they are learned from a text corpus. When word-embeddings are used in the Zero-Shot Recognition(ZSR) task, the task becomes a mapping problem between two completely different heterogeneous domains, a low-level visual feature domain, and a word embedding domain, and due to the bias of word-embeddings, it was not easy to learn this mapping function. However, if the context of the visual domain can be learned and embedded, the mapping function of ZSR will be much easier to converge because it only needs to learn the mapping between domains that are more correlated to each other. Therefore, in this paper, we propose a new methodology for embedding the context contained in the visual domain using the annotation information collected from the image dataset. In addition, to utilize the annotations collected from the image dataset for embedding, we proposed a new distance formula to measure the contextual distance between the bounding boxes of objects. Finally, it was verified through various experiments on two datasets that the embeddings learned by our new methodology performed well when applied to ZSR.
Loading