Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection

Zhijing Wan; Zhixiang Wang; Zheng Wang; Xin Xu; Shin'ichi Satoh

Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection

Zhijing Wan, Zhixiang Wang, Zheng Wang, Xin Xu, Shin'ichi Satoh

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 oralEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: This paper investigates the effectiveness of using foundation models (FMs) as information extractor for one-shot subset selection on a set of image datasets, and proposes a novel multi-foundation-model subset selection method called RAM-APL.

Abstract: One-shot subset selection serves as an effective tool to reduce deep learning training costs by identifying an informative data subset based on the information extracted by an information extractor (IE). Traditional IEs, typically pre-trained on the target dataset, are inherently dataset-dependent. Foundation models (FMs) offer a promising alternative, potentially mitigating this limitation. This work investigates two key questions: (1) Can FM-based subset selection outperform traditional IE-based methods across diverse datasets? (2) Do all FMs perform equally well as IEs for subset selection? Extensive experiments uncovered surprising insights: FMs consistently outperform traditional IEs on fine-grained datasets, whereas their advantage diminishes on coarse-grained datasets with noisy labels. Motivated by these finding, we propose RAM-APL (RAnking Mean-Accuracy of Pseudo-class Labels), a method tailored for fine-grained image datasets. RAM-APL leverages multiple FMs to enhance subset selection by exploiting their complementary strengths. Our approach achieves state-of-the-art performance on fine-grained datasets, including Oxford-IIIT Pet, Food-101, and Caltech-UCSD Birds-200-2011.

Lay Summary: Training deep learning models can be expensive and time-consuming, especially when working with large datasets. One way to reduce these costs is by selecting a smaller, informative subset of the data to train on. Traditionally, this requires information extractors that are specifically pre-trained on the same dataset—a process that is both inefficient and inflexible. Our research explores whether general-purpose foundation models can serve as a better alternative. We asked: Can these models help choose better data subsets, and do all foundation models perform equally well? We found that foundation models consistently outperform traditional methods on fine-grained datasets—those requiring subtle visual distinctions, like between dog breeds or bird species. However, their advantage becomes less pronounced on coarse-grained datasets that contain noisy labels. Based on these findings, we developed RAM-APL, a new method that combines multiple foundation models to leverage their complementary strengths. Our approach achieves state-of-the-art results on several fine-grained image benchmarks. This work provides practical guidance on how and when to use foundation models for data selection to make deep learning more efficient.

Primary Area: General Machine Learning

Keywords: one-shot subset selection, foundation models, data-efficient learning

Submission Number: 9286

Loading