Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection

Published: 01 Jan 2025, Last Modified: 21 Jul 2025IEEE Trans. Pattern Anal. Mach. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The objective of few-shot object detection (FSOD) is to detect novel objects with few training samples. The key challenge is constructing a generalized feature space for novel categories with limited data, leveraging the base category space to adapt the detection model. Most fine-tuning methods address this by pre-training on base categories and fine-tuning on novel ones. However, limited novel samples lead to two issues: (1) the features of the novel category are easily implicitly represented by the features of the base category, leading to inseparable classifier boundaries, (2) novel categories with fewer data are not enough to fully represent the distribution, where the model fine-tuning is prone to overfitting. To address these issues, we propose a generalized feature learning method for FSOD by leveraging side information. Specifically, we first construct a knowledge matrix from embedded side information to model semantic relations between base and novel categories. Then, to strengthen the discrimination between semantically similar categories, we further develop contextual semantic supervised contrastive learning which embeds side information. To mitigate overfitting from limited samples, we introduce a side-information guided region-aware masking module that improves sample diversity by removing biased information through counterfactual explanations. Extensive experiments using ResNet and ViT backbones on PASCAL VOC, MS COCO, LVIS V1, FSOD-1K, and FSVOD-500 benchmarks demonstrate that our model outperforms the previous state-of-the-art methods, significantly improving the ability of FSOD in most shots/splits.
Loading