Keywords: Data Pruning, Fairness, Classification Bias, Robustness
TL;DR: We conduct the first systematic evaluation of data pruning algorithms in the fairness framework and propose a "fairness-aware" random pruning protocol that
Abstract: In the era of exceptionally data-hungry models, careful selection of the training data is essential to mitigate the extensive costs of deep learning. Data pruning offers a solution by removing redundant or uninformative samples from the dataset, which yields faster convergence and improved neural scaling laws. However, little is known about its impact on classification bias of the trained models. We conduct the first systematic study of this effect and reveal that existing data pruning algorithms can produce highly biased classifiers. At the same time, we argue that random data pruning with appropriate class ratios has potential to improve the worst-class performance. We propose a "fairness-aware" approach to pruning and empirically demonstrate its performance on standard computer vision benchmarks. In sharp contrast to existing algorithms, our proposed method continues improving robustness at a tolerable drop of average performance as we prune more from the datasets.
Primary Subject Area: Data-centric approaches to AI alignment
Paper Type: Extended abstracts: up to 2 pages
DMLR For Good Track: Participate in DMLR for Good Track
Participation Mode: In-person
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Submission Number: 73
Loading