Abstract: Panoptic Scene Graph Generation (PSG) translates visual scenes to structured linguistic descriptions, i.e., mapping visual instances to subjects/objects, and their relationships to predicates. However, the annotators' preferences and semantic overlaps between predicates inevitably lead to the semantic mappings of multiple predicates to one relationship, i.e., biased-predicate annotations. As a result, with the contradictory mapping between visual and linguistics, PSG models are struggled to construct clear decision planes among predicates, so as to cause existing poor performances. Obviously, it is essential for the PSG task to tackle this multi-modal contradiction. Therefore, we propose a novel method that utilizes unbiased visual predicate representations for Biased-Annotation Identification (BAI) as a fundamental step for PSG/SGG tasks. Our BAI includes three main steps: predicate representation extraction, predicate representation debiasing, and biased-annotation identification. With flexible biased annotation processing methods, our BAI can act as a fundamental step of dataset debiasing. Experimental results demonstrate that our proposed BAI has achieved state-of-the-art performance, which promotes the performance of benchmark models to various degrees with ingenious biased annotation processing methods. Furthermore, our BAI shows great generalization and effectiveness on multiple datasets. Our codes are released at https://github.com/lili0415/BAI.
Loading