Image Generation from Hyper Scene Graphs with Trinomial Hyperedges Using Object Attention

Ryosuke Miyake, Tetsu Matsukawa, Einoshin Suzuki

Published: 2024, Last Modified: 13 Nov 2024VISIGRAPP (2): VISAPP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Conditional image generation, which aims to generate consistent images with a user’s input, is one of the critical problems in computer vision. Text-to-image models have succeeded in generating realistic images for simple situations in which a few objects are present. Yet, they often fail to generate consistent images for texts representing complex situations. Scene-graph-to-image models have the advantage of generating images for complex situations based on the structure of a scene graph. We extended a scene-graph-to-image model to an image generation model from a hyper scene graph with trinomial hyperedges. Our model, termed hsg2im, improved the consistency of the generated images. However, hsg2im has difficulty in generating natural and consistent images for hyper scene graphs with many objects. The reason is that the graph convolutional network in hsg2im struggles to capture relations of distant objects. In this paper, we propose a novel image generation model which addresses thi