Abstract: Visual relationship recognition is essential for deeper scene understanding. It poses to recognize 〈subject-predicate-object〉 triplets between object pairs. Previous methods usually treat vastly different predicates equally and neglect the subtle differences between predicates. In this paper, we propose a novel and concise perspective called "predicate-aware learning network (PAL-Net)" for visual relationship recognition. "Predicate-aware" means that we take predicates as a condition in a task-driven manner. Our PAL-Net consists of two key modules: i) a predicate-guided regularization module designed to learn more differentiated representations for various predicates; ii) a predicate-aware contextual modeling module to integrate the efficacy of contextual objects for different predicates respectively. Extensive experiments on VRD and Visual Genome dataset yield remarkable performance gains, verifying the effectiveness of PAL-Net. Besides, PAL-Net also shows good applicability and achieves substantial improvement for human-object interaction detection.
0 Replies
Loading