One-Shot Cross-Domain Instance Detection With Universal Representation

Chen Feng, Jian Cheng, Yang Xiao, Zhiguo Cao

Published: 01 Jan 2025, Last Modified: 16 Oct 2025CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: In this work, we pay research efforts to address the challenging research problem of One-shot Cross-domain Instance Detection (OCID). Particularly, given a one-shot exemplar instance (e.g., from the visible spectrum), OCID aims to detect the instance’s specific counterpart within another domain (e.g., the infrared spectrum). It can be applied to robotics grasping, aerospace navigation, etc. To address this task, one key issue is to extract features of strong instance discrimination and domain adaptation capability. To this end, we propose to characterize the instance with universal representation from multiple deep networks pretrained on different datasets of variational representation biases. Particularly, they are fused with different backbone attention weights via Transformer to yield a universal representation for adaptively fitting the one-shot instance. Meanwhile, inter and intra domain contrastive learning is conducted to drive universal representation learning with hard sample mining, to facilitate domain adaptation capability and discriminative power jointly. For test, we build a new dataset that contains 14 data sources with over 6K instances of diverse types with dramatic appearance variation. Experiments on this dataset verify OCID’s challenges and the superiority of our proposed method. The source code and dataset have been released at https://github.com/OCIDwUR/OCID Note to Practitioners—Given a one-shot exemplar instance, OCID aims to advance the research of general instance detectors, while considering three practical challenges simultaneously: data scarcity, domain shift, and instance-level discrimination. The “general” denotes that the types of target and task scenarios in OCID are diversified so that the detection methods are challenged with complex pattern distribution and are encouraged to possess open-set recognition capability, with the aim to meet different detection requirements such as autonomous driving and robotics. We also consider different settings of domain shift, such as multiple spectra and image degradation, to make it more applicable. To address the challenging OCID task, we propose a Transformer-based universal representation method that learns to dynamically ensemble a group of pretrained feature backbones with various representation biases and adapts to new even unseen OCID tasks without fine-tuning. The domain adaptation capability and instance discrimination are jointly facilitated, in a meta-learning and hence transferable manner. The experimental results on our carefully designed diversified dataset validate the generality of our method.
Loading