Open-World Object Detection through Unsupervised Concept Learning and Representation Binding
Abstract: Real-world deployment of object detection systems demands robust identification of both known categories and novel out-of-distribution (OOD) instances. While current techniques improve OOD awareness through feature-space regularization and synthetic anomalies, they struggle to capture the fundamental nature of "unknown" without actual OOD exemplars. We present SLOT-DET, a novel framework that addresses this gap by learning disentangled representations that explicitly model both known and unknown concepts. Our approach is motivated by the observation that in natural scenes, in-distribution and OOD objects frequently co-occur, requiring comprehensive contextual understanding rather than isolated instance analysis. SLOT-DET operates through two complementary mechanisms: first, an unsupervised concept encoder that learns to decompose scenes into interpretable, slot-based representations using self-supervised objectives; second, an adaptive fusion module that marries these learned concepts with detection outputs through attention-based binding. The resulting framework enables explicit representation of both known and unknown concepts within a unified architecture. We additionally propose a novel scoring function that leverages concept activation patterns for more reliable OOD identification. Comprehensive experiments show SLOT-DET establishes new state-of-the-art performance.
Loading