Abstract: Current state-of-the-art instance recognition models have demonstrated strong ability in close-world environments while struggling in open-world scenarios, where the novel objects are not annotated in the pre-defined taxonomy during training. The challenge comes from that, in the unlabeled regions, novel objects and backdrop co-exist and are hard to differentiate. To demystify the secrets hidden in the mystery unannotated areas, we present a conceptually simple yet effective open-world instance recognition model, SWORD, answering the two critical questions: (1) How to discover the novel objects? We identify that the direct training of classification would make the features of novel objects degrade to the background. We demonstrate that a simple stop-gradient operation not only prevents feature degradation, but also allows the network to enjoy the merit of heuristic label assignment. (2) How to distinguish the objects from the backdrop? By maintaining a universal object queue, we obtain the object center for performing contrastive learning, in order to enlarge the distinction between objects and background. While the previous works only focus on pursuing recall and neglect precision, we show the prominence of SWORD by giving consideration to both criteria and achieving state-of-the-art performance in various open-world cross-category and cross-dataset generalizations. In particular, on VOC to non-VOC setup, our method sets a new state-of-the-art of 39.6% on ARb100. For COCO to UVO generalization, SWORD significantly outperforms the previous best open-world model by 6.0% on APb and 9.0% on ARb100, respectively.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
5 Replies
Loading