Exploring the Essence of Relationships for Scene Graph Generation via Causal Features Enhancement Network
Abstract: Scene graph generation (SGG) establishes a structured representation between multiple objects by exploring their relationship for visual perception and reasoning tasks. Existing SGG methods often fit the relationships’ distribution by introducing language prior or statistical knowledge. However, the relationships should be the semantic reflection of the interaction between objects, rather than the statistical dependency between their categories. To solve this problem, we propose a novel Causal Features Enhancement Network (CFEN) to mine the essential semantic features between objects and relationships. Specifically, by decomposing the object features into class-generic and object-specific components, the causal graph framework is designed to analyze these existing SGG methods. To measure the influence of object-specific features for relationship recognition, we construct the counterfactual training framework for computing the difference between fact and counterfactual logits. Besides, to strengthen the role of object-specific features and learn the interaction between objects, a distribution matching loss is proposed to compute the KL divergence between counterfactual outputs and standard difference distributions and modulate the relations predictions. Finally, compared with the current state-of-the-art methods, the extensive experimental results on VG150 and VrR-VG datasets demonstrate the effectiveness and superiority of our proposed CFEN.
External IDs:dblp:journals/pami/ZhouLZL25
Loading