Joint Embeddings of Scene Graphs and ImagesDownload PDF

15 Apr 2025 (modified: 19 Feb 2017)ICLR 2017Readers: Everyone
Abstract: Multimodal representations of text and images have become popular in recent years. Text however has inherent ambiguities when describing visual scenes, leading to the recent development of datasets with detailed graphical descriptions in the form of scene graphs. We consider the task of joint representation of semantically precise scene graphs and images. We propose models for representing scene graphs and aligning them with images. We investigate methods based on bag-of-words, subpath representations, as well as neural networks. Our investigation proposes and contrasts several models which can address this task and highlights some unique challenges in both designing models and evaluation.
TL;DR: We propose models for embedding scene graphs in a joint space with images
Conflicts: inria.fr, centralesupelec.fr, kuleuven.be, cs.toronto.edu
3 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview