Abstract: Object detection requires plentiful data annotated with bounding boxes for model training. However, in many applications, it is difficult or even impossible to acquire a large set of labeled examples for the target task due to the privacy concern or lack of reliable annotators. On the other hand, due to the high-quality image search engines, such as <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Flickr</monospace> and <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Google</monospace> , it is relatively easy to obtain resource-rich unlabeled datasets, whose categories are a superset of those of target data. In this article, to improve the target model with cost-effective supervision from source data, we propose a partial transfer learning approach <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">QBox</monospace> to actively query labels for bounding boxes of source images. Specifically, we design two criteria, i.e., informativeness and transferability, to measure the potential utility of a bounding box for improving the target model. Based on these criteria, <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">QBox</monospace> actively queries the labels of the most useful boxes from the source domain and, thus, requires fewer training examples to save the labeling cost. Furthermore, the proposed query strategy allows annotators to simply labeling a specific region, instead of the whole image, and, thus, significantly reduces the labeling difficulty. Extensive experiments are performed on various partial transfer benchmarks and a real COVID-19 detection task. The results validate that <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">QBox</monospace> improves the detection accuracy with lower labeling cost compared to state-of-the-art query strategies for object detection.
0 Replies
Loading