Foundation Model-Based Data Selection for Dense Prediction Tasks

Niclas Popp; Dan Zhang; Jan Hendrik Metzen; Matthias Hein; Lukas Schott

Foundation Model-Based Data Selection for Dense Prediction Tasks

Niclas Popp, Dan Zhang, Jan Hendrik Metzen, Matthias Hein, Lukas Schott

Published: 06 Mar 2025, Last Modified: 22 Mar 2025ICLR 2025 FM-Wild WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: data selection, coreset, data pruning, object detection, semantic segmentation, dense prediction tasks, active learning

TL;DR: We discuss how to use foundation models for data selection in order to effectively use a constrained annotation budget for dense prediction tasks.

Abstract: Data selection, the problem of selecting a small dataset to be labeled from a large unlabeled pool is an important practical problem. In particular, dense prediction tasks such as object detection and segmentation require high-quality labels at pixel level, which are particularly costly to obtain. We propose object-focused data selection (OFDS) which leverages object-level representations from foundation models to ensure that the selected image subsets semantically cover the target classes, including rare ones. We show that OFDS achieves state-of-the-art performance both for object detection and image segmentation with substantial improvements over all baselines in scenarios with imbalanced class distributions. Moreover, we demonstrate that pre-training with autolabels from foundation models on the full datasets before fine-tuning on human-labeled subsets selected by OFDS further enhances the final performance. Finally, OFDS consistently improves active learning methods when replacing the random selection of the initial labeled dataset, the so-called "cold start problem'' of active learning, with OFDS.

Submission Number: 25

Loading