Region-Based Representations Revisited

Published: 01 Jan 2024, Last Modified: 10 Nov 2024CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We investigate whether region-based representations are effective for recognition. Regions were once a mainstay in recognition approaches, but pixel and patch-based features are now used almost exclusively. We show that recent class-agnostic segmenters like SAM can be effectively combined with strong self-supervised representations, like those from DINOv2, and used for a wide variety of tasks, including semantic segmentation, object-based image re-trieval, and multi-image analysis. Once the masks and features are extracted, these representations, even with linear decoders, enable competitive performance, making them well suited to applications that require custom queries. The representations' compactness also makes them well-suited to video analysis and other problems requiring inference across many images.
Loading