Keywords: Scene Graph Generation, Relation Augmentation
TL;DR: We propose semantic and visual augmentation strategies to address the long-tail bias in visual relations
Abstract: The goal of scene graph generation (SGG) is to identify the relationships between objects in an image. Many recent methods have been proposed to address a critical challenge in SGG - the biased distribution of relations in the dataset and the semantic space. Although the unbiased SGG problem has recently gained popularity, current SGG research has not thoroughly examined different types of augmentation. Recent works have focused on augmenting objects instead of relations and ignored the opportunities for pixel-level augmentation. We propose a novel relation augmentation method to use semantic and visual perturbations to balance the relation distribution. We use relation dataset statistics to boost the distribution of rare relation classes. We also use visual MixUp and grafting techniques to increase the sample size of triplets with tail relation labels. Our proposed method, RelAug, effectively reduces the long-tail distribution of predicates. We demonstrate this method can be easily adapted to existing methods and produces state-of-the-art performance on the Visual Genome dataset. The authors will make the source code publicly available for reproduction.
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3891
Loading