Beyond the Known: Unsupervised Concept Decomposition for Open-World Object Detection
Abstract: The inability of standard object detectors to handle novel object classes is a major impediment to their deployment in uncontrolled, open-world settings. A prevailing strategy to mitigate this is to regularize the feature space of known classes, but this often fails to equip the model with a robust, generalizable notion of "unfamiliarity." We argue that a more fundamental approach is required—one that shifts focus from discriminating between known instances to understanding the compositional nature of a visual scene. To this end, we introduce CODEX (Concept-Decompositional Explanations), a framework that learns to parse scenes into a vocabulary of latent visual concepts without direct supervision. At its core, CODEX utilizes a slot-based architecture to discover these foundational concepts and then learns to dynamically bind them to object proposals from a base detector. This process effectively enriches the detector's representations with global, context-aware information, enabling it to reason about whether a given object aligns with the learned conceptual vocabulary of the environment. Our experiments show that this concept-centric paradigm significantly improves OOD detection compared to previous state-of-the-art methods.
Loading