Abstract: Due to the high transferability of features extracted from early layers (called local features), aligning marginal distributions of local features has achieved compelling results in unsupervised domain adaptive object detection. However, such marginal feature alignment suffers from the class label shift between source and target domains. Existing class label shift correction methods focus on image classification, and cannot be directly applied to object detection due to objects’ co-occurrence. Meanwhile, one property of local features is that they have small receptive fields and can be easily mapped back to specific areas of input images. Therefore, to handle object co-occurrence scenarios, we propose to leverage this property to decompose the source feature maps and compute the source domain class distribution at the pixel level. The decomposition is based on each feature pixel’s receptive field overlap with ground- truth bounding boxes. In the target domain, where no labels are available, we estimate this distribution using predicted bounding boxes and thus get the estimated class label shift between domains. This estimated shift is further used to re-weight source local features during the feature alignment. To the best of our knowledge, this is the first work trying to explicitly correct class label shift in unsupervised domain adaptive object detection. Experimental results demonstrate that this approach can systematically improve several recent domain adaptive object detectors, such as SW and HTCN on benchmark datasets with different degrees of class label shift.
0 Replies
Loading