Object-Centric Relational Representations for Image Generation

Published: 02 Jul 2024, Last Modified: 02 Jul 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Conditioning image generation on specific features of the desired output is a key ingredient of modern generative models. However, existing approaches lack a general and unified way of representing structural and semantic conditioning at diverse granularity levels. This paper explores a novel method to condition image generation, based on object-centric relational representations. In particular, we propose a methodology to condition the generation of objects in an image on the attributed graph representing their structure and the associated semantic information. We show that such architectural biases entail properties that facilitate the manipulation and conditioning of the generative process and allow for regularizing the training procedure. The proposed conditioning framework is implemented by means of a neural network that learns to generate a 2D, multi-channel, layout mask of the objects, which can be used as a soft inductive bias in the downstream generative task. To do so, we leverage both 2D and graph convolutional operators. We also propose a novel benchmark for image generation consisting of a synthetic dataset of images paired with their relational representation. Empirical results show that the proposed approach compares favorably against relevant baselines.
Submission Length: Regular submission (no more than 12 pages of main content)
Code: https://github.com/LucaButera/graphose_ocrrig
Assigned Action Editor: ~Ole_Winther1
Submission Number: 2183