Abstract: Scene Graph Generation (SGG) delivers structured knowledge to represent complex scenes and has proven effective in many computer vision tasks. However, traditional SGG models suffer from two limitations that hinder their applicability for higher-level visual tasks: (1) a rigid structure that results in low efficiency and limited flexibility, and (2) biased optimization that results in biased predictions that favor uninformative predicates. To resolve these two issues, we propose GSGG (Generic Scene Graph Generation), a novel, efficient, and flexible SGG model that (1) combines generalized modules to construct top-performance and high-efficiency SGG model and (2) employs a prompt learning-based relation decoder with a novel Hierarchical Prompt (HP) learning method to mitigate biased optimization. HP utilizes the composition of basic prompts constrained to progressively narrowed class groups and encourages the corresponding prompts to focus on the learning of increasingly informative predicates. Extensive evaluations on three SGG benchmarks demonstrate the excellent efficiency and performance of GSGG with HP. We also introduce a novel predicate generalization task with a new benchmark, and experiments on it demonstrate the effectiveness of HP in base-to-novel predicate generalization.
External IDs:dblp:journals/ijcv/ZhuXWWL25
Loading