PS-TTL: Prototype-based Soft-labels and Test-Time Learning for Few-shot Object Detection

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In recent years, Few-Shot Object Detection (FSOD) has gained widespread attention and made significant progress due to its ability to learn models with strong generalization power using extremely limited annotated data. Although the fine-tuning-based paradigm for FSOD has become mainstream, where detectors are initially pretrained on base classes with sufficient samples and then fine-tuned on novel classes with few annotated samples, the scarcity of samples in novel classes hampers the precise capture of their data distribution. To address this issue, we propose a novel framework for few-shot object detection, namely Prototype-based Soft-labels and Test-Time Learning (PS-TTL). Specifically, we design a Test-Time Learning (TTL) module that employs a mean-teacher network for self-training to discover novel instances on test data, effectively alleviating the problem of overfitting to the distribution of base class. Furthermore, we develop a Prototype-based Soft-labels (PS) strategy via assessing similarities between pseudo-labels and category prototypes to unleash the potential of low-quality pseudo-labels, thereby significantly mitigating the constraints posed by few-shot samples. Extensive experiments on both the VOC and COCO benchmarks show that PS-TTL achieves a new state-of-the-art, highlighting its effectiveness.
Primary Subject Area: [Content] Media Interpretation
Relevance To Conference: Our research focuses on few-shot object detection. In multimedia systems, processing and understanding large volumes of visual data is one of the core tasks. And there are many researchers focusing on how to use textual cues for object detection. However, some categories are difficult to describe using text. As the saying goes, “a picture is worth a thousand words”, we can utilize a few images as visual cues for detection. Therefore, few-shot object detection have board applications in the multimedia filed. For example, few-shot object detection can help multimedia systems quickly detect objects of new categories even when the number of training samples for these categories is limited. Our work contributes to enhancing the detection performance of few-shot object detectors. We propose a test-time learning framework, enabling the detection system to conduct online learning during usage. And we introduce a prototype-based soft-labeling strategy to fully utilize the few-shot samples. Our work can be integrated into text-based object detectors to further enhance the capabilities of multimedia systems.
Supplementary Material: zip
Submission Number: 2882
Loading