Keywords: Active Learning; Subset Selection; quantum acceleration; Maximum Mean Discrepancy
Abstract: Active learning (AL) techniques are known for selecting the most informative data points from large datasets, thereby enhancing model performance with fewer labeled samples. This makes AL particularly useful in tasks where labeling is limited or resource-intensive. However, most existing effective methods rely on uncertainty scores to select samples, often overlooking diversity, which results in redundant selections, especially when the batch size is small compared to the overall dataset. This paper introduces Efficient Blockwise Diverse Active Learning (EBDAL), a generalizable framework that combines uncertainty with diversity-based selection to overcome these limitations. By partitioning the dataset into blocks via a clustering strategy, we ensure diverse sampling within each block, enabling more efficient handling of large-scale datasets. To quantify diversity, we minimize the Maximum Mean Discrepancy (MMD) between the selected subset and the full dataset, which is then reformulated as a Quadratic Unconstrained Binary Optimization (QUBO) problem. The resulting QUBO is submodular, which permits an efficient greedy algorithm. We further demonstrate feasibility on real quantum hardware through an end-to-end selection experiment. Our experimental results demonstrate that EBDAL not only improves the accuracy of uncertainty-based strategies but also outperforms a wide range of selection methods, achieving substantial computational speedups. The findings highlight EBDAL’s robustness, efficiency, and adaptability across various datasets.
Supplementary Material: pdf
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 11800
Loading