PairDETR : Joint Detection and Association of Human Bodies and Faces

Published: 01 Jan 2024, Last Modified: 22 May 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Image and video analysis requires not only accurate object detection but also the understanding of relationships among detected objects. Common solutions to relation modeling typically resort to stand-alone object detectors followed by non-differentiable post-processing techniques. Re-cently introduced detection transformers (DETR) perform end-to-end object detection based on a bipartite matching loss. Such methods, however, lack the ability to jointly detect objects and resolve object associations. In this paper, we build on the DETR approach and extend it to the joint detection of objects and their relationships by introducing an approximated bipartite matching. While our method can generalize to an arbitrary number of objects, we here focus on the modeling of object pairs and their relations. In particular, we apply our method PairDETR to the problem of detecting human bodies and faces, and associating them for the same person. Our approach not only eliminates the need for hand-designed post-processing but also achieves excellent results for body-face associations. We evaluate PairDETR on the challenging CrowdHuman and CityPersons datasets and demonstrate a large improvement over the state of the art. Our training code and pretrained models are available at https://github.com/mts-ai/pairdetr
Loading