Abstract: This paper studies the problem of end-to-end windows mining directly from detection output. Traditional object detection systems approach this problem in an ad-hoc manner, say, Non-Maximum Suppression (NMS). Beyond NMS, multi-class context modeling has been explored thoroughly recent years. But all these methods put their emphasis on eliminating false positive windows rather than improving recall. To address this problem, we firstly study this problem and propose semantic windows mining. To improve recall, we propose Selective Forward Search (SFS) which keeps most of the semantic windows while substantially reduces the number of false positives. After SFS, to improve precision, we present the end-to-end windows mining by means of similarity refining optimized for mean Average Precision (mAP) and overlap regression. We show a noticeable improvement on the PASCAL VOC datasets in both recall and precision.
Loading