Webly Supervised Knowledge-Embedded Model for Visual Reasoning

Published: 01 Jan 2024, Last Modified: 13 Nov 2024IEEE Trans. Neural Networks Learn. Syst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Visual reasoning between visual images and natural language remains a long-standing challenge in computer vision. Conventional deep supervision methods target at finding answers to the questions relying on the datasets containing only a limited amount of images with textual ground-truth descriptions. Facing learning with limited labels, it is natural to expect to constitute a larger scale dataset consisting of several million visual data annotated with texts, but this approach is extremely time-intensive and laborious. Knowledge-based works usually treat knowledge graphs (KGs) as static flattened tables for searching the answer, but fail to take advantage of the dynamic update of KGs. To overcome these deficiencies, we propose a Webly supervised knowledge-embedded model for the task of visual reasoning. On the one hand, vitalized by the overwhelming successful Webly supervised learning, we make much use readily available images from the Web with their weakly annotated texts for an effective representation. On the other hand, we design a knowledge-embedded model, including the dynamically updated interaction mechanism between semantic representation models and KGs. Experimental results on two benchmark datasets demonstrate that our proposed model significantly achieves the most outstanding performance compared with other state-of-the-art approaches for the task of visual reasoning.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview