Semantic Hallucination Mining for Unforeseen Concept Discovery in Object Detection
Abstract: Reliable detection of unanticipated concepts is vital when object detectors leave the laboratory. Most recent efforts regularise the instance embedding space through virtual outlier synthesis, yet the absence of genuine unknown samples still hampers the emergence of a convincing “otherness” representation. We introduce LSE, a plug-and-play framework that excavates latent semantic hypotheses to cultivate an implicit vocabulary of the unseen. LSE first trains an auxiliary vision–language slot synthesiser on unlabeled web imagery; the module distills a compact set of context-responsive semantic slots that encode plausible yet absent object ideas under relational sparsity constraints. Next, a cross-attention concept binder dynamically associates every region proposal with its most compatible slot, forcing the detector to rehearse responses to hallucinated semantics during training without extra annotations. At runtime, we derive an image-guided deviation score that contrasts the bound semantic hypothesis with the detector’s empirical belief, producing a calibrated indicator of unfamiliarity. Extensive experiments on COCO as in-distribution and Objects365, OpenImages, and nuScenes as out-of-distribution demonstrate that LSE establishes a new state of the art while adding negligible overhead.
Loading