Keywords: Coreset selection, Submodularity
TL;DR: MODE adaptively selects the most useful data throughout training, achieving theoretical guarantees and scalable efficiency while preserving model performance with far less data.
Abstract: We present \mode (Multi-Objective adaptive Data Efficiency), a framework that dynamically combines coreset selection strategies based on their evolving contribution to model performance. Unlike static methods, \mode adapts selection criteria to training phases:
emphasizing class balance early, diversity during representation learning, and uncertainty at convergence. We show that MODE achieves $(1-1/e)$-approximation with $O(n \log n)$ complexity
and demonstrate competitive accuracy while providing interpretable insights into data utility evolution. Experiments show \mode reduces memory requirements
%by 10× on ImageNet
while providing actionable insights about which data types matter most during different training phases.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 12768
Loading