Abstract: In many application domains including medical imaging, experimental design, as well as robotics, labeled data are expensive to acquire while unlabeled samples are abundant. Such high labeling costs well motivate the active learning (AL) paradigm that judiciously selects the most informative data instances to label. Specifically, this paper considers a streaming AL setting where unlabeled samples arrive sequentially and an oracle decides to label them or not based on a certain criterion. This active labeling process can benefit from a statistical function model, that provides well-calibrated uncertainty values to guide the oracle to make the informed labeling decision. Towards statistical modeling with adaptivity and robustness in the streaming setting, a recently developed ensemble Gaussian process (EGP) model is leveraged that has weights adapted to the labeled data collected incrementally. Building on this EGP model, this work advocates a novel labeling criterion where the oracle calculates the Kullback-Leibler divergence between the predictive pdfs of each unlabeled instance to make the labeling decision. Numerical tests on synthetic and real datasets in the regression task showcase the merits of the proposed EGP-AL approach relative to the competing alternatives.
Loading