BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation

Published: 01 Jan 2025, Last Modified: 17 Oct 2025Inf. Fusion 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•To address the information interaction challenge in Scene Graph Generation (SGG), we propose a novel bidirectional conditioning factorization within a semantic-aligned space, enhancing information exchange between predicates and entities by introducing mutual dependence.•We present an end-to-end SGG model Bidirectional Conditioning TRansformer (BCTR) to implement our factorization. Specifically, Bidirectional Conditioning Generator (BCG) is designed to augment the feature spaces through bidirectional attention mechanisms.•We introduce Random Feature Alignment (RFA) to regularize the feature spaces with Vision–Language Pre-trained Models (VLPMs) while preserving diversity for specific tasks. RFA facilitates BCG in learning interaction patterns from long-tail datasets and boosts model performance on tail categories.•Extensive experiments on the Visual Genome and Open Image V6 datasets demonstrate that BCTR achieves state-of-the-art performance compared to baselines.
Loading