Efficient Blockwise Diverse Active Learning

Zhicheng Yao; Wenguo Yang; Dun Ma; Yancheng Chen; Xiaoming Sun; Shengminjie Chen

Efficient Blockwise Diverse Active Learning

Zhicheng Yao, Wenguo Yang, Dun Ma, Yancheng Chen, Xiaoming Sun, Shengminjie Chen

18 Sept 2025 (modified: 17 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Active Learning; Subset Selection; quantum acceleration; Maximum Mean Discrepancy

Abstract: Active learning (AL) techniques are known for selecting the most informative data points from large datasets, thereby enhancing model performance with fewer labeled samples. This makes AL particularly useful in tasks where labeling is limited or resource-intensive. However, most existing effective methods rely on uncertainty scores to select samples, often overlooking diversity, which results in redundant selections, especially when the batch size is small compared to the overall dataset. This paper introduces Efficient Blockwise Diverse Active Learning (EBDAL), a generalizable framework that combines uncertainty with diversity-based selection to overcome these limitations. By partitioning the dataset into blocks via a clustering strategy, we ensure diverse sampling within each block, enabling more efficient handling of large-scale datasets. To quantify diversity, we minimize the Maximum Mean Discrepancy (MMD) between the selected subset and the full dataset, which is then reformulated as a Quadratic Unconstrained Binary Optimization (QUBO) problem. The resulting QUBO is submodular, which permits an efficient greedy algorithm. We further demonstrate feasibility on real quantum hardware through an end-to-end selection experiment. Our experimental results demonstrate that EBDAL not only improves the accuracy of uncertainty-based strategies but also outperforms a wide range of selection methods, achieving substantial computational speedups. The findings highlight EBDAL’s robustness, efficiency, and adaptability across various datasets.

Supplementary Material: pdf

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 11800

Loading