Keywords: active learning, deep learning, generalization error, self-supervised learning, cold-start problem, uncertainty, scalability
TL;DR: In this work, we present a principled, theory-guided framework to explain and predict the behavior of diverse AL strategies across datasets, budget regimes, and training conditions.
Abstract: Active learning (AL) aims to reduce annotation costs by querying informative and representative samples for labeling. Despite significant progress, many AL methods remain heuristic and lack a unified theoretical foundation. We present the first systematic and theory-guided framework that connects the core principles of AL (uncertainty, representation, and coverage) to a decomposition of generalization error into empirical risk, distributional discrepancy, model complexity, and confidence. This mapping not only explains why different AL strategies excel under varying annotation budgets but also provides a blueprint for designing future methods, positioning our work as a foundation for principled AL development. Our analysis unifies prior empirical observations into a generalization-theoretic foundation, complemented by extensive experiments on CIFAR and ImageNet subsets with self-supervised embeddings and pretrained encoders. Results show that representation is critical in early rounds to address cold-start issues; coverage promotes diversity in mid-budget regimes; and uncertainty becomes most effective once decision boundaries are partially learned. We also observe that while per-sample reductions in model complexity are modest, their cumulative effect across acquisition rounds is substantial. We further assess runtime behavior, highlighting trade-offs between theoretical alignment and scalability. Rather than proposing a new method, our contribution is a unifying and generalizable framework that explains strategies and guides principled AL design, bridging theory and practice.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 14651
Loading