Noise-Guided Predicate Representation Extraction and Diffusion-Enhanced Discretization for Scene Graph Generation
TL;DR: For the scene graph generation task, we propose, for the first time, a feature reconstruction enhancement method based on diffusion models and discretization mapping to mitigate biased predictions caused by the long-tail distribution problem.
Abstract: Scene Graph Generation (SGG) is a fundamental task in visual understanding, aimed at providing more precise local detail comprehension for downstream applications. Existing SGG methods often overlook the diversity of predicate representations and the consistency among similar predicates when dealing with long-tail distributions. As a result, the model's decision layer fails to effectively capture details from the tail end, leading to biased predictions. To address this, we propose a Noise-Guided Predicate Representation Extraction and Diffusion-Enhanced Discretization (NoDIS) method. On the one hand, expanding the predicate representation space enhances the model's ability to learn both common and rare predicates, thus reducing prediction bias caused by data scarcity. We propose a conditional diffusion model to reconstructs features and increase the diversity of representations for same category predicates. On the other hand, independent predicate representations in the decision phase increase the learning complexity of the decision layer, making accurate predictions more challenging. To address this issue, we introduce a discretization mapper that learns consistent representations among similar predicates, reducing the learning difficulty and decision ambiguity in the decision layer. To validate the effectiveness of our method, we integrate NoDIS with various SGG baseline models and conduct experiments on multiple datasets. The results consistently demonstrate superior performance.
Lay Summary: To address the widely existing issue of data imbalance (long-tailed distribution), we propose a feature enhancement method based on diffusion models and discretization mapping. This approach leverages the generative capability of diffusion models to perform online feature augmentation, while the discretization mapping aggregates representations of semantically similar predicates to alleviate decision-level pressure. When applied to the scene graph generation task, our method effectively mitigates biased predictions caused by long-tailed distributions and achieves strong performance across multiple datasets.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/gavin-gqzhang/NoDIS
Primary Area: Deep Learning
Keywords: Scene Graph Generation; Diffusion Model; Feature Reconstruction
Submission Number: 1755
Loading