Abstract: Single Domain Generalization (SDG) tackles the problem of training a model on a single source domain so that
it generalizes to any unseen target domain. While this has
been well studied for image classification, the literature on
SDG object detection remains almost non-existent. To address the challenges of simultaneously learning robust object localization and representation, we propose to leverage
a pre-trained vision-language model to introduce semantic
domain concepts via textual prompts. We achieve this via
a semantic augmentation strategy acting on the features extracted by the detector backbone, as well as a text-based
classification loss. Our experiments evidence the benefits of
our approach, outperforming by 10% the only existing SDG
object detection method, Single-DGOD [52], on their own
diverse weather-driving benchmark.
0 Replies
Loading