Abstract: Many recent works model the Human Object Interaction detection process as a set prediction problem with the help of transformer architecture while demonstrating promising performance. However, previous transformer-based detectors suffer from the variable scale of instances involved in HOI. Moreover, the naive transformer provides a global receptive field for the query to search for contextual information, which also introduces potential redundant information for the corresponding human-object pair. In this paper, we propose PR-Net, a Progressive Refinement Network equipped with two designed refinement modules to tackle the problems above, which provides a coarse-to-fine framework for Human Object Interaction detection. In addition, the refinement modules are organized in a localization-interaction mutual-guided manner to exploit the benefits of the co-optimization between the instance localization and interaction classification and promote HOI detection performance. Our proposed method achieves competitive performance in two HOI detection benchmarks and extensive experiments demonstrate its effectiveness.
0 Replies
Loading